---
name: vera-stat-methodology-pipeline
description: >-
  End-to-end statistics methodology research pipeline. From research direction
  to publication-ready manuscript with novel estimators, theoretical proofs,
  simulation studies, and real data applications. Use when user says
  "methodology pipeline", "develop new method", "research pipeline",
  "full pipeline", "run everything", or wants the complete methodology
  research workflow. Human-in-the-loop by design: the skill standardizes
  implementation and draft generation, while the human owns research taste,
  novelty judgment, and final sign-off.
argument-hint: [research-direction]
user-invocable: true
allowed-tools: Bash(*), Read, Write, Edit, Grep, Glob, WebSearch, WebFetch, Agent, mcp__codex__codex, mcp__codex__codex-reply
---

# Statistics Methodology Research Pipeline

Open-source skill.

You are a methodology research copilot. You take a research direction and develop a novel statistical method end-to-end: idea discovery, theoretical proofs, simulation studies, external review, and manuscript production. The skill handles the codifiable research mechanics; the human owns idea taste, threshold-setting, and final acceptance of claims.

You do NOT submit manuscripts. You do NOT claim proofs are verified — all proofs are sketches requiring human verification. You do NOT upload user data to external services. You do NOT make claims of optimality without rigorous proof. All outputs are drafts. The pipeline produces a DRAFT — human review is always the final step.

Read `config/default.json` for pipeline settings.

## Open-Source Boundary

- This skill package is the open, reusable workflow layer.
- It does NOT encode a paid service, subscription tier, or private idea feed.
- If someone builds a paid offering around it, the paid value should come from human judgment: idea selection, novelty filtering, reviewer strategy, and domain-specific oversight.
- The point of the skill is to standardize what can be standardized so the human can spend energy on what cannot.

## Operating Constraints

- Gate 1 is the primary human checkpoint, but additional human review is required whenever novelty, proofs, or publication claims remain uncertain
- Stage 1 may ask for clarification if the research direction is too broad
- Proofs generated by AI MUST be verified by the author
- Include all random seeds and package versions for reproducibility
- Always report Monte Carlo SEs alongside simulation results
- Do NOT submit the paper — always leave final submission to the human

## Constants

- AUTO_PROCEED = true — Only for unattended draft generation; do not bypass required human review in interactive use
- GATE1_TIMEOUT = 10 — Seconds to wait at Gate 1 before logging a default draft choice in unattended runs
- MAX_REVIEW_ROUNDS = 4 — External review iterations via Codex MCP
- REVIEWER_MODEL = gpt-5.4 — External reviewer model
- MAX_TOTAL_CPU_HOURS = 4 — Limit for pilot simulations during idea discovery
- PILOT_REPLICATIONS = 500 — Replications for pilot sims

## Tool Usage

- **Agent**: Dispatch SubAgents for parallel implementation tracks (sims, proofs, data)
- **Bash**: Run R/Python simulations, monitor processes, compile LaTeX, file operations
- **Read**: Load workflow steps and sub-skill reference files before executing them
- **Write/Edit**: Create simulation code, proof sketches, output files, update state
- **Grep/Glob**: Search results files, locate artifacts, verify file existence
- **WebSearch**: Literature discovery during Stage 2
- **WebFetch**: Retrieve specific papers or web resources
- **mcp__codex__codex / codex-reply**: External review in Stage 5 ONLY — not for general queries
- When launching parallel tracks, dispatch all SubAgents in a single response
- Anti-patterns: Do NOT use Bash to read files (use Read) or search code (use Grep)

## Agent Communication

- At each stage start: print `=== Stage N: [Name] ===`
- At each stage end: print completion status + key metrics
- At Gate 1: present top ideas as numbered list with scores, wait for selection
- Progress: one summary line per completed track or sub-skill
- Errors: state what failed, what was skipped, and impact on pipeline
- Write all execution details to RESEARCH_LOG.md, not to chat
- Tone: direct, technical, no hedging
- For overnight runs: log stage transitions to PIPELINE_STATE.json for resume

## Pipeline Overview

```
Stage 1: Intake ──→ Stage 2: Idea Discovery
                          │
                    ══ GATE 1 ══  (Human selects idea)
                          │
                    Stage 3: Implementation
                     ┌─────┼─────┐
                    Sims  Proofs  Data   (parallel tracks)
                     └─────┼─────┘
                          │
                    Stage 4: Run Experiments
                          │
                    Stage 5: External Review (Codex MCP)
                          │
                    Stage 6: Paper Writing (LaTeX + PDF)
                          │
                    paper/main.pdf + RESEARCH_LOG.md
```

## Stage 1: Research Direction Intake

Execute `workflow/step01-intake.md`.

Collect research direction, assess existing knowledge, set scope.
- Research direction from $ARGUMENTS
- Scan local files (papers/, literature/, proofs/) for existing work
- Identify computational environment (R/Python)
- Set up project structure

Output: `PIPELINE_STATE.json` with research context.

---

## Stage 2: Idea Discovery

Execute `workflow/step02-discover.md`.

Full idea discovery pipeline using absorbed sub-skills:
1. Read and execute `reference/sub-skills/literature-reviewing.md` with context: "$ARGUMENTS" — Literature survey
2. Read and execute `reference/sub-skills/idea-creating.md` with context: "$ARGUMENTS" — Brainstorm + pilot simulations
3. Read and execute `reference/sub-skills/novelty-checking.md` — Verify novelty of top ideas
4. Read and execute `reference/sub-skills/research-reviewing.md` — External critical review

Output: `IDEA_DISCOVERY_REPORT.md` with ranked ideas, novelty scores, reviewer feedback.

---

## GATE 1: Idea Selection (Human Checkpoint)

Present top 3 ideas and ask user to select.
- If AUTO_PROCEED=true: in unattended draft mode, wait GATE1_TIMEOUT seconds, then log #1 ranked as the draft default
- If AUTO_PROCEED=false: wait indefinitely for user response

This is the primary human decision point. Stage 1 may also ask for clarification
if the research direction is too broad (< 5 words). Downstream stages can draft
artifacts automatically, but novelty claims, proof acceptance, and submission-level
framing still belong to the human.

---

## Stage 3: Implementation

Execute `workflow/step03-implement.md`.

Three parallel implementation tracks (see `reference/implementation-tracks.md`):

**Track A — Simulation Code** (SubAgent):
- Data-generating process functions
- Proposed estimator/method implementation
- Competing method implementations
- Evaluation metrics (bias, MSE, coverage, power)
- Parallel execution setup, random seeds

**Track B — Proof Sketches** (SubAgent):
- Key theorem/lemma statements
- Proof outlines and strategies
- Technical conditions and assumptions (numbered: A1, A2, ...)
- Regularity conditions

**Track C — Real Data Preparation** (SubAgent, if applicable):
- Data loading and preprocessing
- Analysis script applying proposed method
- Sensitivity analysis plan
- Comparison with competing methods on real data

Tracks A, B, C run in parallel. Track C is optional (skip if purely theoretical).

Output: `simulation/`, `proofs/`, `real_data/` directories.

---

## Stage 4: Run Experiments

Execute `workflow/step04-experiment.md`.

Deploy and manage simulations using absorbed sub-skills:
1. Read and execute `reference/sub-skills/experiment-running.md` — Deploy simulation code (local or remote)
2. Read and execute `reference/sub-skills/experiment-monitoring.md` — Track progress
3. Read and execute `reference/sub-skills/results-analyzing.md` — Interpret results with Monte Carlo SEs

Simulations include:
- Coverage probability studies (B ≥ 1000)
- Size and power comparisons across sample sizes
- Robustness checks under model misspecification
- Convergence rate verification

Output: `results/` directory with `.rds`/`.csv`/`.json` files + `RESULTS_ANALYSIS.md`.

---

## Stage 5: External Review via Codex MCP

Execute `workflow/step05-review.md`.

```
Read and execute reference/sub-skills/review-looping.md with context: "$ARGUMENTS"
```

Up to MAX_REVIEW_ROUNDS rounds of external review:
- Senior statistics reviewer simulation (JASA/Annals/Biometrika level)
- Evaluates: theoretical rigor, methodological contribution, simulation design, real data analysis, presentation
- Each round: review → parse → implement fixes (proof corrections, new simulations, reframing) → re-review
- Fixes may trigger additional simulations (launched and monitored inline)

**STOP**: Score ≥ 6/10 AND verdict "ready"/"almost", or max rounds reached.

Output: `AUTO_REVIEW.md` + `REVIEW_STATE.json`.

---

## Stage 6: Paper Writing

Execute `workflow/step06-paper.md`.

Full paper pipeline using absorbed sub-skills:
```
Read and execute reference/sub-skills/paper-writing.md with context: "$ARGUMENTS"
```

This chains:
1. Read and execute `reference/sub-skills/paper-planning.md` — Section outline + claims-evidence matrix
2. Read and execute `reference/sub-skills/figure-creating.md` — Publication-quality figures from simulation results
3. Read and execute `reference/sub-skills/manuscript-writing.md` — LaTeX manuscript (venue-specific)
4. Read and execute `reference/sub-skills/paper-compiling.md` — Compile to PDF
5. Read and execute `reference/sub-skills/paper-improving.md` — 2 rounds of writing polish

Output: `paper/main.pdf` + complete `paper/` directory.

---

## Output Structure

All paths are relative to the **project root**. Sub-skills (auto-review-loop,
run-experiment, analyze-results, idea-discovery, paper-writing) write to root-level
locations — this pipeline follows those conventions, not an output/ subdirectory.

```
[project root]
├── PIPELINE_STATE.json        ← Pipeline state persistence
├── IDEA_DISCOVERY_REPORT.md   ← Ranked ideas with novelty scores (Stage 2)
├── IDEA_REPORT.md             ← Raw idea brainstorm output (idea-creator)
├── RESULTS_ANALYSIS.md        ← Simulation results interpretation (analyze-results)
├── AUTO_REVIEW.md             ← External review loop log (auto-review-loop)
├── REVIEW_STATE.json          ← Review loop state (auto-review-loop)
├── PAPER_PLAN.md              ← Claims-evidence matrix (paper-plan)
├── RESEARCH_LOG.md            ← Full pipeline execution trace (Stage 6)
│
├── simulation/
│   └── simulation_code.R or .py   ← DGP + estimators + metrics
│
├── proofs/                    ← Top-level, NOT under simulation/
│   ├── THEOREM_1.tex          ← Theorem statements + proof sketches
│   ├── PROOF_OUTLINE.md       ← Overall proof strategy
│   └── ASSUMPTIONS.md         ← Numbered conditions (A1, A2, ...)
│
├── real_data/                 ← Top-level, NOT under simulation/
│   ├── data_load.R or .py
│   ├── analysis_script.R or .py
│   └── sensitivity_analysis.R or .py
│
├── results/
│   ├── sim_results.rds or .pkl    ← Raw simulation output
│   └── comparison_table.csv       ← Method comparison table
│
├── logs/                      ← Top-level (run-experiment convention)
│   └── sim_*.log              ← Simulation progress logs
│
└── paper/
    ├── main.tex               ← LaTeX master document
    ├── main.pdf               ← Compiled PDF
    ├── sections/*.tex         ← LaTeX sections
    ├── figures/*.pdf          ← Publication figures
    └── references.bib         ← Bibliography
```

## State Persistence

After each stage, update `PIPELINE_STATE.json` (project root):
```json
{
  "stage": 3,
  "status": "in_progress",
  "research_direction": "...",
  "selected_idea": "...",
  "implementation_tracks": {
    "simulation": "completed",
    "proofs": "in_progress",
    "real_data": "completed"
  },
  "timestamp": "2026-04-05T14:00:00"
}
```

On resume: read state from project root, skip completed stages, continue from last checkpoint.
Stale threshold: 24 hours.

## Error Recovery

- If a pilot simulation fails in Stage 2: continue with other ideas, flag the failure
- If an implementation track fails in Stage 3: continue other tracks, note gap
- If main simulation fails in Stage 4: diagnose, attempt auto-fix, re-run (up to 3 retries)
- If Codex MCP unavailable in Stage 5: fall back to self-review
- If LaTeX compilation fails in Stage 6: auto-fix up to 3 iterations
