---
name: autotune
description: One eval-tune round. Applies approved fixes, selects tasks, runs eval, analyzes results, and updates loop state. Standalone use stops for human review; `/autotune-loop` can orchestrate repeated rounds.
---

# AutoTune

One full fix → eval → analyze round.

## When to Use

- After reviewing a previous round's `common_problems_<agent>.md` and approving the proposed next steps.
- To kick off round 0 on a fresh task set (skips the fix step).
- When `/autotune-loop` needs one orchestrated round worker.

## Modes

`/autotune` has two modes:

- **Manual mode**: default for standalone `/autotune`. Do one round, update `loop_state.json` with `status: "waiting_review"` and `last_round.recommended_action: "wait_human"`, then stop for human review.
- **Orchestrated mode**: only when invoked from `/autotune-loop`. Do the same round work, update `loop_state.json` with the round verdict, and return control to the loop controller.

Invocation context decides the mode. Do not infer orchestrated behavior from a stale `projects/autotune/meta/loop_state.json` alone.

## Loop

```
1. FIX ──► 2. PREPARE ──► 3. RUN ──► 4. ANALYZE ──► STOP
```

### Step 1 — Fix

Apply changes from the previous round's approved `## Next Steps`.

- **MUST read `references/tuning_principles.md` before making any change.** Every proposed change must pass its three gates (anti-overfit, token minimalism, generalization).
- For prompt, tool description, or app skill changes, use `/prompt-tune` to determine the correct ownership layer before editing.
- For code changes, follow a full implementation workflow: inspect the existing code, make the minimal scoped change, run relevant verification, and do not skip failure analysis.
- Commit: `feat(agent): autotune round N — <summary>`.
- **Sync & rebuild**: After committing, push to remote and rebuild the APK so the eval runs against the new code. See Step 3 for the exact commands.
- Round 0: skip this step.

### Step 2 — Prepare

Select tasks for this round. Full universe: `eval/config/aw_fullset.txt`.

Selection rules:
1. **Directly affected**: Tasks whose failure root cause matches what was just fixed.
2. (Optional) **Regression canaries**: Only include if explicitly requested. Do not add by default.
3. (Optional) **Stuck tasks**: Re-test only if you have a new idea and the change still passes the shared tuning principles.
4. **Budget**: ~5-10 tasks normally, up to 20 for regression sweeps.

Use `projects/autotune/meta/scoreboard.json` to judge whether a targeted retry is still productive.

Subtract `eval/config/cannot_handle_group.txt`. Write the selected tasks to `eval/config/autotune_round_N.txt`.

### Step 3 — Run

**Pre-flight: sync & rebuild** (MANDATORY if Step 1 made any changes):
```bash
git push
if [[ -f .closepaw-local.env ]]; then source .closepaw-local.env; fi
: "${CLOSEPAW_REMOTE:?Set CLOSEPAW_REMOTE in .closepaw-local.env}"
: "${CLOSEPAW_REMOTE_DIR:?Set CLOSEPAW_REMOTE_DIR in .closepaw-local.env}"
REMOTE="$CLOSEPAW_REMOTE"
REMOTE_DIR="$CLOSEPAW_REMOTE_DIR"
ssh "$REMOTE" "cd $REMOTE_DIR && git pull && ./gradlew assembleDebug"
```
Skip only if this round had no code/prompt/skill changes (e.g., round 0 with no fix step). Running eval on stale code wastes an entire round.

Read `references/eval_runner.md` for the exact commands for each configuration.

| Flag | Effect |
|------|--------|
| `--remote` | Run eval on the configured remote worker instead of local machine |
| `--parallel N` | Use N emulators in parallel (currently max 2). Falls back to serial if parallel startup fails |

Monitor for stalls. If a task hangs (no output for several minutes), check accessibility permission on the device. If needed, stop the runner, remove completed tasks from the config, and re-run the remainder.

**Post-run: pull results to local** (MANDATORY for `--remote` runs):
```bash
# Pull eval results from remote to local
if [[ -f .closepaw-local.env ]]; then source .closepaw-local.env; fi
: "${CLOSEPAW_REMOTE:?Set CLOSEPAW_REMOTE in .closepaw-local.env}"
: "${CLOSEPAW_REMOTE_DIR:?Set CLOSEPAW_REMOTE_DIR in .closepaw-local.env}"
REMOTE="$CLOSEPAW_REMOTE"
REMOTE_DIR="$CLOSEPAW_REMOTE_DIR"
rsync -avz "$REMOTE:$REMOTE_DIR/eval/results/" eval/results/
```
All analysis in Step 4 reads from local `eval/results/`. If you skip this pull, Step 4 will either fail or analyze stale data.

**Overlap with Step 4**: You do NOT need to wait for the full run to finish. As soon as a task completes, start its `/cog-tune` analysis (Step 4.1) in parallel while remaining tasks continue running. For remote runs, pull incrementally as tasks finish.

### Step 4 — Analyze

**All analysis artifacts MUST be written locally** (not on the remote). The analysis writes to the ignored local artifact workspace under `projects/autotune/round_N/` and `projects/autotune/meta/`. If running analysis from the local machine (normal case), this happens automatically. Do NOT skip writing per-task analysis or the common problems summary.

For each task in the run (**MUST use a separate subagent per task** for cleaner context — do NOT analyze multiple tasks in one agent):
1. Run `/cog-tune` (eval entry). **MUST read the cog-tune/SKILL.md, and follow "Inspect cognition step-by-step" section steps**. Write per-task analysis to `projects/autotune/round_N/<run_id>/per_task/<TaskName>_<agent>.md` following the template `assets/per_task_analysis_template.md`.
2. Append a short entry to `projects/autotune/meta/per_task/<TaskName>_<agent>.md` — score, turns, one-line behavior delta vs previous run. Newest on top.

Once done with all tasks:
3. Summarize into `projects/autotune/round_N/<run_id>/common_problems_<agent>.md` following `assets/common_problems_template.md`. Must include a `## Next Steps` section.
4. Run `python3.12 scripts/scoreboard.py` to regenerate `projects/autotune/meta/scoreboard.json` and `projects/autotune/meta/scoreboard.md`.
5. Run `python3.12 scripts/token_counts.py` to regenerate `projects/autotune/meta/token_counts.json` and `projects/autotune/meta/token_counts.md`.
6. Append to `projects/autotune/meta/changelog.md`.
7. Update `projects/autotune/meta/issues.md` (new issues, resolved issues, parked tasks).
8. Update `projects/autotune/meta/loop_state.json` as the final control-plane handoff for this round.

Stop for deeper design review only when at least one of these is true:
- The same task cluster has failed for 2+ rounds with no progress.
- The proposed fix touches the core prompt or major tool semantics.
- A capability-gap candidate needs confirmation before being parked.

Note: <agent> = your name, e.g., claude, codex (do your analysis independently, don't look at other agents' analyses even if they exist).

## Loop State Contract

`projects/autotune/meta/loop_state.json` is the only control-plane file for the current loop.

- In **manual mode**, set `status` to `waiting_review`, keep `mode` as `manual`, and set `last_round.recommended_action` to `wait_human`.
- In **orchestrated mode**, write the round verdict into `last_round` and return. `/autotune-loop` decides whether to continue or stop.
- Do not create a separate `round_verdict.json`.

### STOP

Present to human:
- Scoreboard diff (what improved, regressed, stuck).
- `common_problems_<agent>.md` with proposed next steps.

Wait for approval before the next `/autotune`.

## Templates

- Per-task analysis: `.claude/skills/cog-tune/assets/per_task_analysis_template.md` (owned by /cog-tune)
- Common problems summary: `assets/common_problems_template.md` (owned by /autotune)

## Key Files

- Scoreboard (SOT): `projects/autotune/meta/scoreboard.json`
- Scoreboard (view): `projects/autotune/meta/scoreboard.md`
- Loop state: `projects/autotune/meta/loop_state.json`
- Global issues: `projects/autotune/meta/issues.md`
- Changelog: `projects/autotune/meta/changelog.md`
- Per-task changelogs: `projects/autotune/meta/per_task/<TaskName>.md`
- Per-round analysis: `projects/autotune/round_N/<run_id>/`
- Task universe: `eval/config/aw_fullset.txt`
- Exclusions: `eval/config/cannot_handle_group.txt`
- Scoreboard script: `scripts/scoreboard.py`
- Eval runner: `eval/aw_bridge/runner.py`
- Eval remote config: `eval/config/remote.yaml`
- Cog-tune skill: `.claude/skills/cog-tune/SKILL.md`
- Shared tuning principles: `.claude/skills/autotune/references/tuning_principles.md`
- Loop controller: `.claude/skills/autotune-loop/SKILL.md`
