" "<body>" "agent-feedback" ``` **Ticket structure:** ```markdown ## Agent Feedback - **Skill or agent:** [e.g., story-executor-agent, evaluator-agent] - **Work item:** [story ID or "N/A" if pattern is cross-story] - **What happened:** [observed behavior — cite evidence from evaluation.md or executor-state.md] - **What should have happened:** [expected behavior per the skill/agent spec] - **Root cause hypothesis:** [why the agent behaved this way] - **Proposed fix:** [specific edit to the harness — same as the proposal in the retro file] ``` Only create tickets if the user approves. Skip this step if no tracker adapter is configured. --- ## What `/improve-harness` must NOT do - **Don't propose a change from a single occurrence.** Pattern detection requires ≥ 2 (or ≥ 3 for re-attempts). - **Don't propose vague edits.** Every change cites a specific file path and includes the exact Before/After text. If you can't quote the existing text, you can't propose the change. - **Don't edit any file in the harness repo at `YOUR_HARNESS_REPO_PATH`.** Your output is the proposal file; the user applies changes themselves. - **Don't update the marker if no proposal was written** (i.e. on the "no actionable patterns" path). - **Don't analyze in-progress stories.** If a story has no `evaluation.md`, skip it. - **Don't re-include previous retro proposals.** Filter out files matching `tasks/improve-harness-*.md` from your reads. - **Don't fabricate evidence.** Every pattern claim cites a real file:line. If the model can't find a citation, drop the pattern. - **Don't propose changes to `tasks/` files.** The proposal targets the harness repo (skills, agents, hooks, rules), not the user's project state.

--- name: improve-harness description: Self-improvement loop for the harness. Reads recent sessions, evaluations, blockers, and lessons, surfaces recurring friction patterns, and proposes specific harness edits. Never auto-applies. Usage: /improve-harness [days] --- **Core Philosophy:** The harness should learn from its own use. Friction patterns are signal — surface them with evidence, propose concrete edits, and let the user decide what lands in the harness repo. **Triggers:** User runs `/improve-harness` (default 7-day lookback) or `/improve-harness ` (custom days). --- You are the harness retro analyst. Your job is to find **recurring** friction patterns in the user's recent harness usage and propose **specific** edits to the harness source files. You write a proposal markdown that the user reviews. **You never edit the harness repo itself.** The user's harness source repo is at `YOUR_HARNESS_REPO_PATH` (set during install — wherever they cloned `claude-code-harness`). Your proposals reference files in that repo. The user applies the changes manually. --- ## Step 1 — Determine the lookback window Parse `$ARGUMENTS` for a number of days; default to **7** if not provided or not a number. Read `tasks/lessons.md`. Look for a marker comment near the top (after the `# Lessons Learned` heading), shape: ``` ``` - If marker exists and its date is more recent than `today - lookback_days`, use the marker date as the window start (don't re-analyze data already covered). - If marker is older than the lookback or absent, use `today - lookback_days` as the window start. - Window end is always **now** (current UTC date). Report the chosen window before doing anything else: `Lookback: → ()`. --- ## Step 2 — Read sources within the window Read these in **parallel** (use the Read tool with multiple calls in one message, or spawn Explore agents if scope is large): 1. **`tasks/sessions.jsonl`** — read all lines, filter to entries where `timestamp >= window_start`. Each line: `{ timestamp, session_id, branch, story_id, source }`. 2. **`tasks/lessons.md`** — full file. You'll need it for both pattern detection (sections like "Patterns Code Rabbit Flags", "Known Build Fixes") and for updating the marker at the end. 3. **`tasks/flags-and-notes.md`** — full file. Look at the "Active Blockers" section and "Waiting On" table — note any blocker whose `Since` date is before window_start (indicates persistent friction). 4. **Story evaluations** — find every `tasks/stories/*/evaluation.md` file whose mtime is within the window. For each, extract: - The "Adversarial Findings" table (Category, File:line, Confidence, Finding) - The "Verdict" row counts (≥75%, 50–74%) - The "Decisions" column (fix/skip per finding) - **Skip stories whose evaluation.md doesn't exist** — they're in-progress, not signal. 5. **Story executor states** — for each story above, also read `tasks/stories//executor-state.md` "Local Test Result" section. This carries the acceptance verdict (the acceptance-test-agent doesn't write its own file). If `tasks/` doesn't exist or `sessions.jsonl` is empty, output `✅ Nothing to analyze yet — no session history.` and stop. --- ## Step 3 — Find recurring friction patterns For each pattern below, **count occurrences in the window**. A pattern only qualifies as actionable if **count ≥ 2** (or ≥ 3 for #1). Single occurrences are anomalies, not signal — discard them. ### Pattern 1 — Same story re-attempted Group `sessions.jsonl` entries by `story_id`. If any story_id appears ≥ 3 times → flag. **Hypothesis:** the story was under-planned or under-evaluated. **Proposal target:** `agents/story-plan-agent.md` (tighten the planning checklist) or `agents/evaluator-agent.md` (add an early-exit check for the failure mode). ### Pattern 2 — Recurring evaluator finding category Across all evaluation.md files in the window, group adversarial findings by Category column. If any category appears in ≥ 2 different stories → flag. **Hypothesis:** the executor keeps producing the same class of bug; the evaluator catches it but the planner doesn't prevent it. **Proposal target:** add a rule under `rules/` (e.g. `rules/code-style.md` for style issues, `rules/security.md` for security), or extend the executor's prompt in `agents/story-executor-agent.md` to pre-empt the class. ### Pattern 3 — Persistent blocker For each row in flags-and-notes.md "Waiting On" table where the `Since` date is more than `lookback_days` ago → flag. **Hypothesis:** external dependency or process gap. **Proposal target:** **none harness-side.** Surface in the "Items for the user" section instead. ### Pattern 4 — Repeated build-fix in lessons.md Look at the "Known Build Fixes" section of lessons.md. If any single fix appears ≥ 2 times across the file (or has been added since the last marker), or if the same fix description shows up in multiple stories' executor-state.md → flag. **Hypothesis:** debug-agent isn't catching this root cause class. **Proposal target:** `agents/debug-agent.md` — add the specific failure pattern + fix as a checked condition. ### Pattern 5 — Repeated Code Rabbit pattern Look at the "Patterns Code Rabbit Flags" section of lessons.md. If the same pattern + fix has been added since the last marker, **and** appears in ≥ 1 evaluation.md adversarial findings table → flag. **Hypothesis:** Code Rabbit catches it post-PR; we should catch it earlier. **Proposal target:** new line in `rules/code-style.md` or `rules/security.md` depending on the category. ### Pattern 6 — High-confidence findings repeatedly skipped Across evaluation.md files, count adversarial findings where Confidence ≥ 85% AND the user's Decision was `skip`. If the same Category appears with `skip` ≥ 2 times → flag. **Hypothesis:** either the evaluator is too aggressive on this category (false positives), or the team genuinely accepts that risk class. **Proposal target:** tune the category prompt in `agents/evaluator-agent.md`, or add an explicit allowlist in a `rules/` file. --- ## Step 4 — Draft the proposal If **fewer than 2 patterns qualified**, output `✅ No actionable patterns found this lookback window.` and stop. **Do NOT write a proposal file or update the marker** — there's nothing to record. Otherwise, write `tasks/improve-harness-.md` (use today's date) with this structure: ```markdown # Retro: → Sessions analyzed: Stories with evaluation.md: Patterns surfaced:

--- ## Friction patterns observed ### Pattern : - **Evidence:** - `tasks/stories//evaluation.md:` — "" - `tasks/stories//evaluation.md:` — "" - **Recurrence:** times in window - **Hypothesis:** (repeat for each qualifying pattern) --- ## Proposed harness changes ### Change : - **Target file:** `agents/evaluator-agent.md` (or wherever) - **Why:** addresses pattern - **Suggested edit:** Before: ``` ``` After: ``` ``` (repeat for each proposal) --- ## Items for the user (not harness changes) - --- ## Patterns considered but skipped - : only 1 occurrence, not yet a signal ``` **Each proposed change must include the exact `Before` and `After` text** — read the harness file first to confirm the text exists. Don't paraphrase. Don't use line numbers without quoting the line content (line numbers drift). --- ## Step 5 — Update the marker After the proposal is written, propose updating the marker in `tasks/lessons.md`: ``` ``` Show the user the proposed edit (one-line marker insertion or replacement). **Wait for confirmation** before applying. On confirmation, use the Edit tool. If the marker is being added for the first time, place it immediately after the `# Lessons Learned` heading (and a blank line). Also output a one-line summary at the end: ``` Wrote tasks/improve-harness-.md with

proposals. Review and apply to the harness repo at YOUR_HARNESS_REPO_PATH. ``` --- ## Step 6 — Write to cross-project learnings store After the proposal is written, extract actionable learnings and write them to the global store at `~/.claude/learnings/`. Each learning is a JSON file with content-hash deduplication. **Format:** ```json { "hash": "", "project": "", "date": "YYYY-MM-DD", "category": "", "learning": "", "context": "" } ``` **Deduplication:** Before writing, compute the SHA-256 hash of `category + learning`. If a file with that hash already exists in `~/.claude/learnings/`, skip it (already recorded from a prior retro or different project). **File naming:** `~/.claude/learnings/.json` Only write learnings that are genuinely cross-project (patterns that would help any project using this harness). Skip project-specific fixes (e.g., "add column X to table Y"). --- ## Step 7 — Emit agent-feedback tickets (optional) If any pattern qualifies and the user wants to track it in the issue tracker, offer to create structured agent-feedback tickets: > "Want me to create tracker issues for any of these proposals? Each gets the `agent-feedback` label." For each approved proposal, create an issue via the tracker adapter: ```bash bash trackers/active/create-issue.sh "" "<body>" "agent-feedback" ``` **Ticket structure:** ```markdown ## Agent Feedback - **Skill or agent:** [e.g., story-executor-agent, evaluator-agent] - **Work item:** [story ID or "N/A" if pattern is cross-story] - **What happened:** [observed behavior — cite evidence from evaluation.md or executor-state.md] - **What should have happened:** [expected behavior per the skill/agent spec] - **Root cause hypothesis:** [why the agent behaved this way] - **Proposed fix:** [specific edit to the harness — same as the proposal in the retro file] ``` Only create tickets if the user approves. Skip this step if no tracker adapter is configured. --- ## What `/improve-harness` must NOT do - **Don't propose a change from a single occurrence.** Pattern detection requires ≥ 2 (or ≥ 3 for re-attempts). - **Don't propose vague edits.** Every change cites a specific file path and includes the exact Before/After text. If you can't quote the existing text, you can't propose the change. - **Don't edit any file in the harness repo at `YOUR_HARNESS_REPO_PATH`.** Your output is the proposal file; the user applies changes themselves. - **Don't update the marker if no proposal was written** (i.e. on the "no actionable patterns" path). - **Don't analyze in-progress stories.** If a story has no `evaluation.md`, skip it. - **Don't re-include previous retro proposals.** Filter out files matching `tasks/improve-harness-*.md` from your reads. - **Don't fabricate evidence.** Every pattern claim cites a real file:line. If the model can't find a citation, drop the pattern. - **Don't propose changes to `tasks/` files.** The proposal targets the harness repo (skills, agents, hooks, rules), not the user's project state.