---
name: dx-step-all
description: Autonomous execution loop — runs each plan step (implement + test + review + commit internally), with step-fix for failures. Stops after 2 consecutive fix failures on the same step. Use to execute the full plan hands-free.
argument-hint: "[Work Item ID or slug (optional — uses most recent if omitted)]"
context: fork
allowed-tools: ["read", "edit", "search", "write", "agent"]
---

You are a coordinator. You run the step pipeline for each step in implement.md, delegating to skills via the Skill tool.

## Progress visibility

You run in a forked context. Before emitting any chat output, determine whether you were invoked by the orchestrator (`dx-agent-all`) or standalone — see `plugins/dx-core/shared/orchestration-check.md`:

```bash
ORCHESTRATED=0
FLAG=".ai/run-context/orchestrating.flag"
if [ -f "$FLAG" ]; then
  AGE=$(( $(date +%s) - $(date -r "$FLAG" +%s) ))
  [ "$AGE" -lt 7200 ] && ORCHESTRATED=1
fi
```

- **If `$ORCHESTRATED == 1`** (orchestrator path): write all canonical artifacts to `$SPEC_DIR/<file>.md` (already documented below) and emit ONLY the `## Return` block to chat.
- **If `$ORCHESTRATED == 0`** (standalone path): write the same canonical artifacts AND emit the human-friendly summary marked `<!-- standalone-only -->` below, followed by the `## Return` block at the very end.

Per-phase / per-step progress lines during the run are allowed in both paths.

The orchestrator (`dx-agent-all`) cannot see your TaskList directly. To preserve user-visible progress, you MUST update `$SPEC_DIR/dev-all-progress.md` after each step transition:

- Mark step in_progress at the start of execution
- Mark step done | failed | healing immediately on transition
- Update within the same logical action — never batch updates

The orchestrator reads this file after this skill returns and prints a one-line status summary to the user. Skill invocations are blocking — the orchestrator does not poll mid-execution. If you skip an update, the user sees stale progress when they ask for a recap.

**Format** — one row per step in a table (see `.ai/templates/spec/dev-all-progress.md.template` if it exists, else use the existing format from prior runs):

| Step | Status | Note |
|---|---|---|
| 1: <title> | done | committed 4bf6fe5 |
| 2: <title> | in_progress | — |
| 3: <title> | — | — |

## Progress Tracking

After loading `implement.md`, use `TaskList` to check for existing tasks from a previous run. If stale tasks exist, cancel them first with `TaskUpdate` (status: `cancelled`). Then create a task for each plan step using `TaskCreate` (e.g., "Step 1: Create dialog XML"). Mark each `in_progress` when executing, `completed` when committed. On fix attempts, update the task subject (e.g., "Step 1: Create dialog XML (fix 1)"). On heal cycles, add a task: "Healing: <corrective action>".

## Flow

```dot
digraph step_all {
    "Locate spec dir + check run-state" [shape=box];
    "Ensure feature branch" [shape=box];
    "Load fix memory" [shape=box];
    "Get next pending step" [shape=box];
    "All steps done?" [shape=diamond];
    "Execute step" [shape=box];
    "Step passed?" [shape=diamond];
    "Fix attempt" [shape=box];
    "Fix succeeded?" [shape=diamond];
    "Consecutive failures < 2?" [shape=diamond];
    "Invoke step-fix (heal)" [shape=box];
    "Heal result?" [shape=diamond];
    "Execute corrective steps" [shape=box];
    "Corrective step failed after 2 fixes?" [shape=diamond];
    "Healing cycle < 2?" [shape=diamond];
    "Log progress + track fix patterns" [shape=box];
    "Completion summary + log run + promote fixes" [shape=doublecircle];
    "STOP: Human intervention needed" [shape=doublecircle];

    "Locate spec dir + check run-state" -> "Ensure feature branch";
    "Ensure feature branch" -> "Load fix memory";
    "Load fix memory" -> "Get next pending step";
    "Get next pending step" -> "All steps done?";
    "All steps done?" -> "Completion summary + log run + promote fixes" [label="yes"];
    "All steps done?" -> "Execute step" [label="no"];
    "Execute step" -> "Step passed?";
    "Step passed?" -> "Log progress + track fix patterns" [label="yes"];
    "Step passed?" -> "Fix attempt" [label="no (blocked)"];
    "Fix attempt" -> "Fix succeeded?";
    "Fix succeeded?" -> "Log progress + track fix patterns" [label="yes"];
    "Fix succeeded?" -> "Consecutive failures < 2?" [label="no"];
    "Consecutive failures < 2?" -> "Fix attempt" [label="yes, retry"];
    "Consecutive failures < 2?" -> "Invoke step-fix (heal)" [label="no, 2 strikes"];
    "Invoke step-fix (heal)" -> "Heal result?";
    "Heal result?" -> "Execute corrective steps" [label="healed"];
    "Heal result?" -> "STOP: Human intervention needed" [label="unrecoverable"];
    "Execute corrective steps" -> "Corrective step failed after 2 fixes?";
    "Corrective step failed after 2 fixes?" -> "Log progress + track fix patterns" [label="no, passed"];
    "Corrective step failed after 2 fixes?" -> "Healing cycle < 2?" [label="yes, failed"];
    "Healing cycle < 2?" -> "Invoke step-fix (heal)" [label="yes, try again"];
    "Healing cycle < 2?" -> "STOP: Human intervention needed" [label="no, 2 cycles exhausted"];
    "Log progress + track fix patterns" -> "Get next pending step";
}
```

## Node Details

### Locate spec dir + check run-state

```bash
SPEC_DIR=$(bash .ai/lib/dx-common.sh find-spec-dir $ARGUMENTS)
```

Maintain run state in `$SPEC_DIR/run-state.json`:

```json
{
  "skill": "dx-step-all",
  "ticket": "<id>",
  "started": "<ISO-8601>",
  "phase": 0,
  "step": 0,
  "completed_phases": [],
  "last_action": "",
  "files_modified": []
}
```

**On invocation:**
1. Check for existing `$SPEC_DIR/run-state.json`
2. If exists and `started` < 2 hours ago → ask user: "Previous run found at phase {phase}, step {step}. Resume or start fresh?"
3. If exists and `started` ≥ 2 hours ago → warn: "Stale run state found (started {time}). Starting fresh." Delete run-state.json.
4. If not exists → create new run-state.json

**During execution:** Update run-state.json after each phase/step completion.
**On completion:** Delete run-state.json (clean exit).

Get the plan metadata (step count and status) — do NOT read full implement.md here:

```bash
bash .ai/lib/plan-metadata.sh $SPEC_DIR
```

This outputs a compact summary (~10 tokens/step vs ~200 for full file). Workers (step, step-fix) will read the full implement.md when they need it.

Print: `Starting execution: <N> steps total, <M> pending.`

### Ensure feature branch

Before executing any steps, run the shared branch guard:

```bash
bash .ai/lib/ensure-feature-branch.sh $SPEC_DIR
```

This no-ops if already on `feature/*` or `bugfix/*`. Creates a feature branch if on any other branch. The branch name is saved to `$SPEC_DIR/.branch`.

Print: `Branch: <BRANCH> (<BRANCH_ACTION>)`

**This is a hard gate** — never execute steps on a protected branch.

### Load fix memory

Before entering the execution loop, check if the project has accumulated fix knowledge from previous runs.

If `.ai/learning/fixes.md` exists:
- Read it
- Include a note when dispatching skills: append to the prompt: `Known fix patterns for this project: <summary of fixes.md content>`
- Print: `Learning: Loaded <N> known fix patterns from previous runs.`

If it does not exist, skip silently.

### Get next pending step

Re-read plan metadata to get the latest status (skills update implement.md). Especially important after step-fix creates new steps. Never read full implement.md in the orchestrator — workers handle that.

```bash
bash .ai/lib/plan-metadata.sh $SPEC_DIR
```

Identify the next step with status `pending`.

### All steps done?

Check if any `pending` or `in-progress` steps remain.

- **yes** → proceed to "Completion summary + log run + promote fixes"
- **no** → proceed to "Execute step" with the next pending step

### Execute step

Invoke `Skill(/dx-step)` for the current step. dx-step now handles implement + test + review + commit as internal phases — no separate dispatches needed.

### Step passed?

Check the result from dx-step:
- **success** → proceed to "Log progress + track fix patterns"
- **failure** → classify the error (see **Error Classification** below), then proceed to "Fix attempt"

**Error Classification:** Before triggering healing, classify the failure using `shared/error-handling.md`:
- **TRANSIENT** → Retry the step (counts toward consecutive failure limit)
- **VALIDATION** → Let dx-step-fix attempt repair (counts as healing cycle)
- **PERMANENT** → Skip healing entirely. Mark step blocked. Report to user immediately. Proceed to "STOP: Human intervention needed".

### Fix attempt

Invoke `Skill(/dx-step-fix)` for the current step.

Track fix attempts per step.

### Fix succeeded?

Check the fix result:
- **yes** → proceed to "Log progress + track fix patterns"
- **no** → proceed to "Consecutive failures < 2?"

### Consecutive failures < 2?

Check consecutive fix failure count for this step:
- **yes, retry** → proceed to "Fix attempt" for another attempt
- **no, 2 strikes** → proceed to "Invoke step-fix (heal)"

### Invoke step-fix (heal)

Track healing cycles per step. Max 2 healing cycles per original step.

Invoke `Skill(/dx-step-fix)` with healing context:
```
/dx-step-fix <SPEC_DIR> --heal --failure-type step-blocked --blocked-step <N>
```

dx-step-fix now handles both fix and heal modes internally.

### Heal result?

Check the step-fix return:
- **healed** → re-read implement.md to find the new corrective step(s). Print a clear plan-mutation notice:
  ```
  ⚠ Plan modified by heal: <N> corrective step(s) added after Step <blocked-step>.
  New steps: <list of step numbers and titles>
  Total steps now: <updated total>
  ```
  Update TaskList to reflect the new steps. Proceed to "Execute corrective steps".
- **unrecoverable** → print: `Step <N> unrecoverable after 2 fixes + healing. Human intervention needed.` Proceed to "STOP: Human intervention needed".

### Execute corrective steps

Run the normal Execute step → Fix attempt cycle on each new corrective step created by step-fix.

### Corrective step failed after 2 fixes?

After running the corrective step through the full cycle (including up to 2 fix attempts):
- **no, passed** → proceed to "Log progress + track fix patterns"
- **yes, failed** → proceed to "Healing cycle < 2?"

### Healing cycle < 2?

Check the healing cycle count for the original step:
- **yes, try again** → proceed to "Invoke step-fix (heal)" for another healing cycle
- **no, 2 cycles exhausted** → print: `Step <N> blocked after 2 healing cycles. Human intervention needed.` Proceed to "STOP: Human intervention needed".

### Log progress + track fix patterns

After each step cycle, print:
```
Step <N>/<total> done — <step title>
```

**Track and persist learning signals** for each step that completes (or fails):
- `step_number`: which step
- `step_title`: the step's title from implement.md
- `fix_attempts`: number of fix attempts (0 if none needed)
- `fix_succeeded`: how many fix attempts succeeded
- `fix_failed`: how many fix attempts failed
- `heal_cycles`: number of healing cycles invoked (0 if none)
- `heal_succeeded`: how many heal cycles succeeded
- `heal_failed`: how many heal cycles failed
- `fix_types`: list of error types encountered (e.g., "compilation", "test-failure", "review-rejection")

**Persist immediately after each step** (not deferred to end-of-run):

```bash
mkdir -p .ai/learning/raw
```

If fix attempts occurred for this step, append one JSONL line per attempt to `.ai/learning/raw/fixes.jsonl`:
```json
{"timestamp":"<ISO-8601>","ticket":"<id>","step":<N>,"error_type":"<type>","fix_description":"<what was done>","result":"<success|failed>"}
```

**Incremental pattern promotion:** After appending, check if any pattern in `fixes.jsonl` now has 3+ successes and 0 failures. If so, promote it immediately (see "Completion summary + log run + promote fixes" for format). This ensures learning survives even if the run is interrupted.

### Completion summary + log run + promote fixes

After all steps are done:

<!-- standalone-only — emit only when $ORCHESTRATED == 0 -->

When running standalone (`$ORCHESTRATED == 0`), emit:

```markdown
## Execution Complete

**<count> steps** executed — <count> succeeded, <count> failed, <count> self-healed.

| Step | Status | Note |
|------|--------|------|
| 1: <title> | done | committed <sha> |
| 2: <title> | done | committed <sha> |
<... one row per step ...>

<If any steps failed:>
### Blocked Steps
- Step <N>: <title> — <error summary>

<If cross-repo note from implement.md:>
### Other repos required
<repo list from implement.md>

### Next step:
- `/dx-step-build` — build and verify
```

When orchestrated (`$ORCHESTRATED == 1`), skip the above block entirely. The orchestrator (`dx-agent-all`) reads `$SPEC_DIR/dev-all-progress.md` directly to summarize step counts and present next-steps. The Cross-Repo Note (when `implement.md` has "Other repos required") still belongs in the orchestrator's Final Summary — pass it through via the `## Return` block's `summary` (truncated) and `next_action` fields.

**Log run record:**

```bash
mkdir -p .ai/learning/raw
```

Append one JSONL line to `.ai/learning/raw/runs.jsonl`:
```json
{"timestamp":"<ISO-8601>","ticket":"<id>","flow":"step-all","total_steps":<N>,"steps_completed":<N>,"fix_attempts":{"succeeded":<N>,"failed":<N>},"heal_cycles":{"succeeded":<N>,"failed":<N>},"step_titles":["<title1>","<title2>"]}
```

Use Bash to append — `echo '<json>' >> .ai/learning/raw/runs.jsonl`

**Final promotion sweep:**

Run one final pattern promotion check (same logic as incremental promotion in "Log progress + track fix patterns") to catch any patterns that crossed the 3-success threshold during the last step. This is a safety net — most promotions happen incrementally.

Read `.ai/learning/raw/fixes.jsonl`. Group entries by `error_type` + `fix_description`. For each combination with **3+ successes AND 0 failures**:
- Generate slug from error type (e.g., `compilation-missing-import`)
- Skip if `.claude/rules/learned-fix-<slug>.md` already exists (promoted earlier)
- Otherwise create:
  ```markdown
  # Learned Fix: <error_type>

  When encountering: <error_type>
  Apply this fix: <fix_description>

  Evidence: <N> successful applications across <ticket list>.
  ```
- Print: `Learning: Promoted fix pattern to .claude/rules/learned-fix-<slug>.md — <fix_description>`

**Summary message:**

If fix patterns were logged:
- Print: `Learning: <N> fix patterns logged, <M> promoted to rules.`

If no fix patterns at all:
- Print: `Learning: Clean run — no fixes needed.`

### STOP: Human intervention needed

The execution loop has been halted. Either:
- A step was marked **unrecoverable** after 2 fix attempts + healing
- **2 healing cycles** were exhausted on the same original step

Print the blocked step number, the error type, and a summary of what was tried. The user must manually intervene (edit `implement.md`, fix the environment, or re-plan).

## Validation Gates

| After Phase | Gate | Fail Action |
|------------|------|-------------|
| Step execution (2a) | Step status is `done` in `implement.md` | Enter fix loop (2b-2d) |
| Fix attempt (2d) | Fix count < 2 for this step | If >=2: enter heal cycle (2d-heal) |
| Heal cycle (2d-heal) | Corrective steps added to `implement.md` | If unrecoverable: STOP with blocked step report |
| All steps complete | No `pending` or `in-progress` steps remain | Proceed to build phase |

## Examples

### Execute full plan
```
/dx-step-all 2416553
```
Reads `implement.md` from `.ai/specs/2416553-add-layout-switcher/`, finds 6 pending steps, and runs each through dx-step (which handles implement + test + review + commit internally). Prints progress after each step.

### Resume after partial completion
```
/dx-step-all 2416553
```
If steps 1-3 are already `done`, picks up at step 4 and continues. Only pending steps are executed.

### Self-healing on failure
When step 3 fails twice, triggers `dx-step-fix` in heal mode which analyzes the error, creates a corrective step 3a, executes it, then retries step 3. If healing also fails after 2 cycles, stops and reports.

## Troubleshooting

### "No pending steps" but plan isn't done
**Cause:** Steps are marked `blocked` or `in-progress` from a previous interrupted run.
**Fix:** Open `implement.md` and reset the stuck step's status to `pending`, then re-run.

### Build fails repeatedly on the same step
**Cause:** The plan step has incorrect instructions or missing context.
**Fix:** Let self-healing run (2 fix attempts + 2 heal cycles). If still failing, run `/dx-step-fix` manually to diagnose, or edit `implement.md` to fix the step instructions.

### "Cannot execute on protected branch"
**Cause:** You're on `development`, `main`, or `master` instead of a feature branch.
**Fix:** The branch guard should auto-create a feature branch. If it fails, manually create one: `git checkout -b feature/<id>-<slug>`.

## Decision Examples

### Heal: Flaky Test
**Failure:** Test passes locally but fails in step execution (timing-dependent assertion)
**Consecutive failures:** 2
**Decision:** HEAL — test is flaky. Fix: add wait/retry in test assertion.

### Escalate: Architecture Mismatch
**Failure:** Step creates a new utility, but existing utility covers the need
**Consecutive failures:** 2
**Decision:** ESCALATE — plan problem, not code problem. Report: "Step 4 violates reuse rule. Existing `forms.js:validateField()` covers this need."

## Success Criteria

- [ ] All steps in implement.md have status: done, blocked, or skipped
- [ ] No steps remain in pending or in-progress
- [ ] Build command exits 0 after final step
- [ ] run-state.json deleted (clean completion)

## Rules

- **Coordinator only** — never implement code yourself. Always delegate via Skill tool.
- **Sequential execution** — steps must run in order. Never parallelize steps.
- **2-strike then heal** — after 2 consecutive fix failures, invoke step-fix in heal mode before giving up.
- **2 healing cycles max** — after 2 healing cycles on the same original step, stop for real.
- **Progress reporting** — print status after each step so the user can follow along.
- **Re-read metadata between steps** — run `plan-metadata.sh` between steps to get the latest status (skills update implement.md). Especially important after step-fix creates new steps. Never read full implement.md in the orchestrator — workers handle that.
- **Don't skip blocked steps** — if a step is blocked and healing failed, stop. Steps have dependencies.

## Platform Compatibility

This skill uses `Skill()` tool calls which work on both Claude Code and Copilot CLI.

**Copilot CLI / VS Code Chat fallback:** If subagent skill invocation fails, run the skills manually in sequence for each plan step:
1. `/dx-step <step-number>` — execute a single step (implement + test + review + commit)
2. On failure: `/dx-step-fix <spec-dir>` — attempt repair (up to 2 tries)
3. On persistent failure: `/dx-step-fix <spec-dir> --heal` — diagnose and create corrective steps
4. Repeat from step 1 for each pending step in `implement.md`

## Return

This skill runs in a forked context. It MUST end with a `## Return` block per `plugins/dx-core/shared/skill-return-contract.md`.

Examples:

```markdown
## Return
verdict: pass
summary: Executed 4 steps; 3 commits; 1 self-heal cycle on Step 3 (lint no-shadow).
artifacts:
  - .ai/specs/2490722-microsite/dev-all-progress.md
  - .ai/specs/2490722-microsite/run-state.json
next_action: continue to Phase 4 (build)
```

```markdown
## Return
verdict: fail
summary: Step 2 blocked after 2 fix attempts — JS test pollution; manual intervention needed.
artifacts:
  - .ai/specs/2490722-microsite/dev-all-progress.md
  - .ai/specs/2490722-microsite/run-state.json
next_action: human fix Step 2; re-run /dx-step-all
```
