--- name: autopilot description: > Autonomous session-orchestration loop. Chains session-start → session-plan → wave-executor → session-end for N iterations with kill-switches (SPIRAL, FAILED wave, carryover > 50%, max-hours, sub-threshold confidence). Reads Mode-Selector output (Phase B) to decide auto-execute vs. fallback. Writes one autopilot.jsonl record per loop run. Phase C scaffold (issue #277); implementation lives in scripts/lib/autopilot.mjs (Phase C-1 follow-up). user-invocable: true tags: [phase-c, autopilot, autonomous, loop] --- # Autopilot Skill ## Status **Phase C-1 partial (2026-04-25, issue #295).** Runtime exists at `scripts/lib/autopilot.mjs` with five of the eight kill-switches enforced: `max-sessions-reached`, `max-hours-exceeded`, `resource-overload`, `low-confidence-fallback` (with iter-1-fallback / iter-2+-exit asymmetry), and `user-abort`. Atomic `autopilot.jsonl` writer (tmp+rename, schema_version 1) and silent-clamp `parseFlags` are shipped. The remaining three kill-switches — `spiral`, `failed-wave`, `carryover-too-high` — and `autopilot_run_id` propagation into `sessions.jsonl` are **deferred to Phase C-1.b**. These require wave-executor to expose `spiral_detected`, `failed_waves`, and `carryover_ratio` on its return shape; that signal- extraction work is the gating change. The loop structure for these checks is already in place via `postSessionKillSwitch()` (currently a no-op stub). Until C-1.b lands, multi-iteration autopilot runs proceed without spiral / failed-wave / carryover safeguards — use single-iteration runs (`--max-sessions=1`) or `--dry-run` previews for routine work; prefer manual `/session [type]` when those signals matter. ## Purpose Autopilot collapses the per-session attention cost when Mode-Selector is confident enough to make routine decisions autonomously. A productive day commonly ships 3–7 sessions; each manual session-start costs the user 10–60 seconds of context-switch attention. When the session is genuinely routine (mechanical refactor, post-merge housekeeping, repeated follow-ups from a planned epic), that attention cost is pure overhead. `/autopilot` reads the Mode-Selector recommendation, executes the recommended session if confidence clears the threshold, then loops — checking kill-switches between iterations. The user invokes the loop once and walks away; autopilot stops itself when work runs out or quality degrades. This is **opt-in by design**: autopilot never starts itself. The user must run `/autopilot` explicitly. Configuration thresholds (`--max-sessions`, `--max-hours`, `--confidence-threshold`) are CLI flags, not Session Config defaults — the user signals intent for THIS run, not a standing policy. ## Command Surface ``` /autopilot [--max-sessions=N] [--max-hours=H] [--confidence-threshold=0.X] [--dry-run] ``` | Flag | Default | Bounds | Meaning | |------|---------|--------|---------| | `--max-sessions` | `5` | 1..50 | Iteration cap (graceful exit when reached) | | `--max-hours` | `4.0` | 0.5..24.0 | Wall-clock budget for entire loop | | `--confidence-threshold` | `0.85` | 0.0..1.0 | Minimum `selectMode` confidence for auto-execute | | `--dry-run` | `false` | — | Print planned iterations without executing | Out-of-range values silently clamp to bounds. `--dry-run` exits after printing — never invokes session lifecycle. ## Loop Semantics ``` state := { iterations_completed: 0, started_at: now(), kill_switch: null, sessions: [] } WHILE state.iterations_completed < max-sessions: IF (now() - state.started_at) > max-hours: kill_switch := 'max-hours-exceeded'; break IF resource_verdict() == 'critical' AND peer_count() > autopilot-peer-abort: kill_switch := 'resource-overload'; break recommendation := mode-selector.selectMode() IF recommendation.confidence < confidence-threshold: IF state.iterations_completed == 0: fallback_to_manual() # iteration 1: hand off cleanly to manual /session flow ELSE: kill_switch := 'low-confidence-fallback' # iteration 2+: exit, let user decide break cap := resource_adaptive_cap() session_result := run_session(mode=recommendation.mode, agents_per_wave_cap=cap) state.sessions.append(session_result.session_id) IF session_result.spiral_detected: kill_switch := 'spiral'; break IF session_result.failed_waves > 0: kill_switch := 'failed-wave'; break IF session_result.carryover_ratio > 0.50: kill_switch := 'carryover-too-high'; break state.iterations_completed += 1 write_autopilot_jsonl(state, kill_switch) print_summary(state, kill_switch) ``` **Atomicity rule:** iteration boundaries are atomic. A session must complete (`/close` including the post-session writes) before the next iteration starts. Autopilot does NOT abort sessions mid-flight; kill-switches are checked AFTER each session completes. ## Kill-Switches | Kill-switch | Trigger | Recovery hint | |-------------|---------|---------------| | `spiral` | wave-executor spiral detection fires | Triage the spiraling wave manually; autopilot will not retry. | | `failed-wave` | Any wave reports `agent_summary.failed > 0` | Investigate failure mode (test contract drift, env issue). Re-run after fix. | | `carryover-too-high` | `carryover_ratio > 0.50` | Last session under-delivered. Reduce scope or split issues before resuming. | | `max-hours-exceeded` | Wall-clock exceeds `--max-hours` | Re-run with higher `--max-hours` or address slow waves. | | `max-sessions-reached` | `iterations_completed == --max-sessions` | Graceful — not an error. | | `resource-overload` | `verdict==critical AND peers > autopilot-peer-abort` | Wait for peer sessions to complete or close them. | | `low-confidence-fallback` | `confidence < threshold` (iteration 2+) | Re-run with lower `--confidence-threshold` or run next session manually. | | `user-abort` | Ctrl+C / Esc | Re-run when ready. | ## Resource-Adaptive Concurrency Autopilot does NOT hard-block on peer Claude processes. It adapts `agents-per-wave` cap per iteration based on the most-restrictive resource signal. | Tier | RAM free | Swap | Peers | macOS memory_pressure | cap | |------|----------|------|-------|------------------------|-----| | green | ≥ 6 GB | < 1 GB | ≤ 2 | ≥ 30% free | Session Config default | | warn | 4–6 GB | 1–2 GB | 3–4 | 15–30% free | 4 | | degraded | 2–4 GB | 2–3 GB | 5–6 | 5–15% free | 2 | | critical | < 2 GB | > 3 GB | > 6 | < 5% free | 0 (coord-direct) | **Most-restrictive-signal-wins:** `[ram=8GB, swap=0, peers=7]` → critical (peer rule wins). Defaults are conservative initial estimates. Phase C-3 follow-up calibrates the swap and memory_pressure thresholds against real autopilot-run effectiveness data. ## Telemetry One record per `/autopilot` invocation, written to `.orchestrator/metrics/autopilot.jsonl` via atomic tmp + rename. See `docs/prd/2026-04-25-autopilot-loop.md` § Output for the full schema. Each iteration's `sessions.jsonl` entry gets an additional optional field `autopilot_run_id` (string or null) so retros can join across the two files without schema changes. Manual sessions write `autopilot_run_id: null` (or omit the field — both treated identically by readers per the v1 schema additive convention). ## Integration with Other Skills - **`mode-selector.mjs::selectMode`** — sole source of mode + confidence per iteration. Autopilot does not implement its own mode logic; v1.x quirks affect autopilot exactly as they affect manual session-start. - **`resource-probe.mjs::probe + evaluate`** — extended in Phase C-2 with swap and memory_pressure signals. Existing consumers (manual session-start, wave-executor) benefit from the new signals automatically. - **`session-start` / `session-plan` / `wave-executor` / `session-end`** — invoked unmodified. Autopilot is a controller around the existing session lifecycle, not a replacement. - **`session-registry.mjs`** — peer-count signal source. Autopilot reads but does not write to the registry beyond the standard hook. - **`mode-selector-accuracy`** — autopilot iterations write accuracy learnings exactly like manual sessions (Phase B-4 contract). The `chosen` field reflects autopilot's auto-execute decision, which equals `recommendation.mode` when confidence ≥ threshold. ## Critical Rules - **Never auto-merge or auto-push beyond `/close` defaults.** `/close` already pushes to origin; autopilot does not add PR creation, merge, or force-push behavior. - **Iteration boundaries are atomic.** Never abort a running session to start a new one. - **Kill-switches checked AFTER each session.** Even if a kill-switch will fire after iteration N, iteration N completes cleanly first. - **Iteration 1 sub-threshold falls back to manual; iteration 2+ exits with kill-switch.** This asymmetry is intentional — see PRD Q8. - **`autopilot.jsonl` is the SOLE writer's responsibility of `autopilot.mjs`.** Other skills must not append to or rewrite this file. - **Mode-Selector contract is read-only here.** Autopilot does not modify `selectMode` output, does not re-rank alternatives, does not patch confidence values. ## Anti-Patterns - Do not invoke `/autopilot` from inside a running session. The skill is a top-level command; nested invocation is undefined behavior. - Do not modify `autopilot.jsonl` schema additively without bumping `schema_version`. Readers MUST treat unknown fields as a forward-compat signal, not as corruption. - Do not bypass `selectMode` to force-run a specific mode. If you want to run a specific mode, use `/session [mode]` manually — that is autopilot's fallback path. - Do not lower `--confidence-threshold` below 0.5 in production. The Mode-Selector fallback table treats `< 0.5` as suggestion-only; autopilot at that threshold becomes a random-walk over modes. - Do not implement kill-switch logic in this skill. Kill-switch enforcement is in `scripts/lib/autopilot.mjs`. The skill documents the contract; the runtime enforces it. ## References - PRD: `docs/prd/2026-04-25-autopilot-loop.md` - Implementation (Phase C-1 partial): `scripts/lib/autopilot.mjs` — exports `runLoop`, `parseFlags`, `writeAutopilotJsonl`, `KILL_SWITCHES`, `FLAG_BOUNDS`, `SCHEMA_VERSION` - Tests (Phase C-1): `tests/lib/autopilot.test.mjs` - Command file: `commands/autopilot.md` - Mode-Selector contract: `skills/mode-selector/SKILL.md` - Resource probe: `scripts/lib/resource-probe.mjs` - Session registry: `scripts/lib/session-registry.mjs` - Epic: [#271 v3.2 Autopilot — Autonomous Session Orchestration](https://gitlab.gotzendorfer.at/infrastructure/session-orchestrator/-/issues/271) - Issue: [#277 Phase C /autopilot Loop Command](https://gitlab.gotzendorfer.at/infrastructure/session-orchestrator/-/issues/277) - Phase A PRD: `docs/prd/2026-04-24-state-md-recommendations-contract.md` - Phase B PRD: `docs/prd/2026-04-25-mode-selector.md` ## Open Questions (Phase C-1 to resolve) - `--confidence-threshold=auto` — let autopilot self-tune from accumulated `mode-selector-accuracy` learnings? Requires ≥ 20 accuracy learnings before useful. - STATE.md `autopilot-active: true` field — should other Claude sessions detect via the session-registry and refuse to start during an autopilot run? Dogfooding will inform. - `failed-wave` granularity — distinguish "agent failed but was retried successfully" from "wave ended with un-recovered failures"? Requires wave-executor schema audit.