---
name: paper-feedback-loop
description: "Quality-driven feedback loop that pushes a paper draft toward genuinely strong A+ through two independent evaluator agents (Reviewer Simulator, Persuasion Analyst). Each round proposes concrete improvements; the main agent applies the best ones. The goal is not convergence but maximizing quality. Use when the user says 'feedback loop', 'evaluate paper', 'improve paper', 'A+ loop', or wants iterative quality improvement on a paper draft."
---

# Paper Feedback Loop

Quality-driven improvement loop. Two isolated evaluators independently grade and propose upgrades per section. The main agent selects the highest-leverage fixes, applies them, and re-evaluates. The goal is not to converge to a passing grade but to push toward writing that would genuinely convince a skeptical reviewer.

## Setup: Check PAPER_CONFIG.md

**Before doing anything else**, check whether `PAPER_CONFIG.md` exists in the project root.

**If it does NOT exist**, you MUST run interactive setup before proceeding:

```
1. Use AskUserQuestion: "What venue are you submitting to? (e.g., ICML 2026, NeurIPS 2025, ACL 2026)"
2. Use AskUserQuestion: "Where is your paper source? (path to main .tex file, e.g., paper/main.tex)"
3. Use AskUserQuestion: "Where should QA workspace files be saved? (default: .qa_workspace)"
4. Use AskUserQuestion: "Which sections should be evaluated? (comma-separated, e.g., intro,method,experiments — leave blank for full paper)"
5. Write PAPER_CONFIG.md with the collected values (see template below)
```

**Hard rule: Do NOT proceed with the feedback loop until PAPER_CONFIG.md exists.** This is not optional.

### PAPER_CONFIG.md Template

```markdown
# Paper Submission QA Config
venue: "{venue}"
main_tex: "{main_tex}"
paper_dir: "{paper_dir}"
workspace_dir: "{workspace_dir}"
sections: [{sections}]
```

---

## Core Philosophy: Aim High, Not Converge

The loop does NOT try to satisfy evaluators just enough to get an A+. It tries to make the paper as strong as possible. The difference:

- **Converge to A+** (wrong): evaluators soften after a few rounds; edits become cosmetic; the paper plateaus at "good enough"
- **Aim for strong A+** (right): each round asks "what is the single highest-leverage improvement left?"; evaluators hold a fixed bar; the paper gets genuinely better each round

**Anti-inflation rule:** Evaluators must include in every round report a "Remaining gap to strong A+" section, even when grading A+. An A+ with no remaining gap means the evaluator believes no rewrite could meaningfully improve this work. An A+ with remaining gaps means "this passes, but here is how it could be stronger."

## Execution Mode: Subagent Pipeline with Quality Gate

Each round dispatches 2 evaluators in parallel per section. The main agent synthesizes feedback, selects the highest-leverage fixes, applies them, then re-evaluates. Rounds continue until the paper is genuinely strong or no further improvement is possible.

## Evaluator Agents

| Agent | Role | Perspective | Key Question |
|-------|------|-------------|--------------|
| paper-eval-reviewer | Reviewer Simulator | A venue reviewer evaluating the paper | "Would I recommend acceptance after reading this?" |
| paper-eval-persuasion | Persuasion Analyst | A rhetoric expert | "Does this make the reader believe the contribution is important?" |

Each agent has an isolated role. They do not see each other's reports. This prevents groupthink.

## Termination Criteria

The loop ends when ANY of these conditions is met:

1. **Strong A+**: Both evaluators grade A+ AND neither has a remaining-gap proposal that would materially improve the paper
2. **Diminishing returns**: The main agent judges that remaining proposals are cosmetic or trade-off — present the trade-offs to the user
3. **Max rounds**: 5 rounds reached — present remaining opportunities to the user

The loop does NOT end just because evaluators output "A+". The main agent must verify that the grade reflects genuine quality, not grade inflation.

## Workflow Per Section

### Step 1: Dispatch Evaluators (parallel)

For section `{section_id}`, launch both evaluators simultaneously. Each evaluator prompt includes the anti-inflation directive:

```
Agent(
  prompt: "Read the agent definition at .claude/agents/paper-eval-reviewer.md.
Read the paper overview at {paper_dir}/overview.md.
Read the paper section at {paper_dir}/{section_id}.md (or the full paper at {main_tex} if sections are not split).
Read PAPER_CONFIG.md for venue and context.

IMPORTANT: Hold a fixed bar across all rounds. Do not soften your standards because this is round N. Grade the paper as if you are seeing it for the first time. Even when grading A+, you MUST include a 'Remaining gap to strong A+' section describing what could still be improved.

Save report to {workspace_dir}/eval_reviewer_{section_id}_r{N}.md.",
  subagent_type: "general-purpose",
  model: "opus",
  run_in_background: true
)

Agent(
  prompt: "Read the agent definition at .claude/agents/paper-eval-persuasion.md.
Read the paper overview at {paper_dir}/overview.md.
Read the paper section at {paper_dir}/{section_id}.md (or the full paper at {main_tex} if sections are not split).

IMPORTANT: Hold a fixed bar across all rounds. If the opening was weak in round 1 and is still weak in round 3, grade it accordingly. Even when grading A+, include a 'Remaining gap to strong A+' section with the single most impactful rhetorical upgrade still possible.

Save report to {workspace_dir}/eval_persuasion_{section_id}_r{N}.md.",
  subagent_type: "general-purpose",
  model: "opus",
  run_in_background: true
)
```

Note: Reports are saved per-round (`_r{N}`) so the main agent can track whether evaluators are holding their bar or inflating grades.

### Step 2: Synthesize and Rank

After both evaluators complete, the main agent:

1. Reads both reports
2. Extracts every revision proposal (from graded sections AND from "remaining gap" sections)
3. **Ranks proposals by leverage**: which single change would most improve acceptance probability?
4. **Checks for grade inflation**: compare this round's grades to last round's. If grades improved but the text barely changed, flag the evaluator and discount its grade.

```markdown
## Round {N} — Section {section_id}

| Evaluator | Grade | Δ from last | Inflation risk? |
|-----------|-------|-------------|-----------------|
| Reviewer Sim | {grade} | {+/-/=} | {yes/no} |
| Persuasion | {grade} | {+/-/=} | {yes/no} |

### Highest-Leverage Proposals (ranked)
1. [from {evaluator}] {specific fix} — **impact**: {why this matters most}
2. [from {evaluator}] {specific fix} — **impact**: {why this matters}
3. [from {evaluator}] {specific fix} — **impact**: {marginal}

### Trade-offs
- Proposal X would improve persuasion but requires adding an experiment

### Decision
Apply proposals: {1, 2}
Skip: {3} (marginal) / Defer to user: {4} (trade-off)
```

**Conflict resolution priority:**
1. Reviewer Simulator (answering the actual reviewer concern is the primary goal)
2. Persuasion Analyst (rhetorical polish amplifies but doesn't replace substance)

### Step 3: Apply Highest-Leverage Fixes

The main agent applies only the top-ranked proposals.

### Step 4: Quality Check

Before re-dispatching evaluators, the main agent asks:
- Did the applied fixes actually improve the text, or just shuffle words?
- Are there any proposals that require user input (e.g., running a new experiment)?

If the remaining proposals all require new experiments or user decisions, **present them to the user** rather than running another round.

## Convergence Log

Track per-round with inflation monitoring:

```markdown
## Quality Log — {section_id}
| Round | Rev.Sim | Persuasion | Fixes | Key improvement |
|-------|---------|------------|-------|-----------------|
| 1     | B       | A          | 3     | Clarified contribution |
| 2     | A       | A          | 2     | Strengthened experiment framing |
| 3     | A+      | A+         | 1     | Sharpened conclusion |
| 3*    | — remaining gaps: Rev.Sim none; Persuasion: intro opening |
| STOP  | Strong A+ — remaining gap is cosmetic |
```

## Multi-Section Execution

Process sections in order defined by `sections` in `PAPER_CONFIG.md`. If `sections` is empty, treat the full paper as a single unit.

After all sections are complete, check for cross-section narrative coherence. If the persuasion evaluator surfaces narrative breaks across sections, fix them and re-run the affected sections' evaluators ONE more time to confirm the fix did not degrade section-level quality.

## Error Handling

| Situation | Response |
|-----------|----------|
| Grade inflation detected | Discount the inflated grade; treat it as one grade lower for termination decisions |
| All A+ but remaining gaps are substantial | Do NOT terminate. Apply the gaps as proposals for another round. |
| Evaluators propose conflicting changes | Present both options to the user with trade-off analysis |
| Round 5 without strong A+ | Present the best version so far + ranked list of remaining improvements for user decision |
| Proposals require new experiments | Block on user — present the proposals, ask how to proceed |

## Output

Save the final quality log to `{workspace_dir}/feedback_loop_log.md`:
- Per-section round history with grades and inflation flags
- All proposals (applied, skipped, deferred to user) with rationale
- Final grade matrix with remaining gaps
