---
name: plan-delivery-audit
description: "Triangulate plan-claim / code-reality / review oracles to classify each plan into DELIVERED+REVIEWED / DELIVERED-UNREVIEWED / PARTIAL / IN-FLIGHT / ABANDONED. Run after any crash or 'did we actually finish what we think we finished?' moment."
description-budget: 175
version: 1.0.0
allowed-tools: ["Read", "Grep", "Glob", "Bash", "Agent"]
argument-hint: "[plan-glob — default: docs/plans/*.md]"
---

# Plan-Delivery Audit

> **Spec backlink:** `docs/plans/2026-05-28-archive-aware-review-oracle-and-audit-skill.md` § Chunk C5.
> **Origin:** Distilled from the 2026-05-27 holodeck audit
> (`X:\claude-unreal-holodeck\tasks\recovery\2026-05-27-plan-delivery-audit.md`) where a
> live-only review-trail read produced a false "most work unreviewed" alarm — 22 archived
> records had been moved by `/workweek-complete` Step 13 and were invisible to the live glob.

## When to invoke

- After any session crash or partition that puts shipping state in doubt.
- Mid-quarter "what shipped this month?" reconciliation.
- Post-merge audit when a chain of handoffs lost a clean stopping point.
- PM question: *"Did we actually finish what we think we finished?"*
- Any time a plan's `status: implemented` self-assertion needs independent verification.

## Out-of-scope sidecars

Sidecar files generated by the coordinator review pipeline inherit their parent plan's
`status:` frontmatter and must be **excluded** from the plan set being audited. Exclude
any file matching:

- `*.prior-art-check.md`
- `*.coverage-check.md`
- `*.docs-check.md`
- `*.review-patrik.md`
- `*.review-sid.md`
- `*.review-camelia.md`
- Any file whose basename contains `.review-` or `.check.`

Apply the exclusion at glob time. Default glob: `docs/plans/*.md` minus the patterns above.
The holodeck audit (2026-05-27) confirmed this exclusion is load-bearing — sidecars were the
source of misleading `status: reviewed` entries in the candidate set before filtering.

## The three oracles (read each independently)

Read each oracle independently before comparing. Oracle 1 is the thing under test; Oracles 2
and 3 are the falsifiers. Never let Oracle 1's self-assertion influence your Oracle 2 or 3 read.

### Oracle 1 — Plan-claim

The plan's own frontmatter and AC status column. Read:

- `status:` field (e.g. `implemented`, `in-progress`, `draft`, `superseded`, `abandoned`)
- Any per-AC `Status` column entries in the plan body
- Execution notes, `reviewed:` fields

**Treat these as the plan's claim, not as ground truth.** Oracle 1 is the hypothesis;
Oracles 2 and 3 either confirm or falsify it. A plan confidently self-reporting
`status: implemented` with `all ACs green` is a starting point, not a verdict.

### Oracle 2 — Code-reality

For each AC whose typed-prefix test can be run mechanically (`grep:`, `cited:`, `bats:`,
`pytest:`, `node:`), run it against current `HEAD`. **Ignore the AC's own `Status` note** — the
test result is the evidence, not the assertion.

- `grep:<pattern>@<file>` → run as `grep -n '<pattern>' <file>` on disk
- `cited:<file>` → verify the file exists at that path
- `bats:<file>` → run `bats <file>` (or note if bats unavailable; mark as unverifiable)
- `pytest:<file>` → run `poetry run pytest <file> -q` (or `uv run pytest`)
- `node:<file>` → run `node <file>`

**Miss on any grep or cited test = PARTIAL**, not delivered, regardless of what Oracle 1 claims.
If a plan has no typed-prefix ACs (narrative-only), note that Oracle 2 is unverifiable and
treat the plan conservatively.

**Dispatch shape:** For audits covering ≥4 plans, dispatch parallel read-only Sonnet scouts
(one per plan). Each scout runs every typed-prefix test for its assigned plan, returns inline
results, EM synthesises. Do NOT modify files, commit, or push during Oracle 2 runs.

### Oracle 3 — Review (archive-aware)

Does an independent review-trail record cover the plan's delivery commits?

**A commit C is covered by trail record [A..B] if and only if:**

```bash
  git merge-base --is-ancestor C B   # must succeed (exit 0): C is within the reviewed window (at or before B)
! git merge-base --is-ancestor C A   # must succeed (exit 0): C is NOT before the window start (after A, exclusive)
```

Both clauses are required: `is-ancestor C B` AND `! is-ancestor C A`. The negative clause
prevents a review record from absorbing commits that predate the reviewed window — without it,
any record with a sufficiently old start SHA would appear to "cover" any commit in the repo.
**The `!` negation in front of the second `git merge-base` is load-bearing — omitting it inverts the
test from "after the window start" to "before the window start" and silently breaks the audit.**

**Trail records live at BOTH `tasks/review-trail/**/*.json` AND `archive/review-trail/**/*.json`.**
Read via:

```bash
list-review-trail-records.sh
```

**DO NOT glob `tasks/review-trail/` alone.** The `/workweek-complete` Step 13 archival moves
all current-week records into `archive/review-trail/<week-starting>/` on every weekly reset.
A live-only read systematically under-counts coverage for anything older than the current week.
This is the exact failure mode that prompted this skill — the 2026-05-27 holodeck audit found
22 archived records invisible in the live dir, producing a false "most work unreviewed" alarm.
See `docs/wiki/session-end-review.md` § Archive-Aware Glob.

**For each trail record returned by `list-review-trail-records.sh`:**

1. Read the record's `sha_range` field (format: `"<start>..<end>"`)
2. For each of the plan's delivery commits C: run the two-clause ancestry test above
3. If any record covers all delivery commits, Oracle 3 = COVERED; else UNCOVERED

If the plan's delivery commits cannot be identified from frontmatter or execution notes,
search `git log --oneline` for commits that touch files the plan's ACs describe.

## Bucket decision tree

Apply this decision tree to each plan after running all three oracles. Every plan resolves
into **exactly one** bucket — no plan should fall into two.

| Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket |
|------------------|-----------------|-------------------|--------|
| `status: implemented` (or `shipped`) | All typed-prefix ACs pass at HEAD | ≥1 trail record covers delivery commits | **DELIVERED+REVIEWED** |
| `status: implemented` (or `shipped`) | All typed-prefix ACs pass at HEAD | No trail record covers delivery commits | **DELIVERED-UNREVIEWED** |
| `status: implemented` (or `shipped`) | ≥1 typed-prefix AC fails at HEAD, OR no typed-prefix ACs (unverifiable) | (any) | **PARTIAL** |
| `status: in-progress` / `draft` / `reviewed` (not yet implemented) | (any) | (any) | **IN-FLIGHT** |
| `status: superseded` / `abandoned` / `cancelled` | (any) | (any) | **ABANDONED** |

**Tie-breaking rules:**

- A plan with `status: implemented` but unverifiable Oracle 2 (no typed-prefix ACs) goes into
  **PARTIAL**, not DELIVERED. Self-assertion without machine-checkable evidence is not delivery.
- A plan with `status: draft` AND commit evidence of substantial shipped work still goes into
  **IN-FLIGHT** — the correct response is to flip the frontmatter, not to reclassify here.
- **DELIVERED-UNREVIEWED** is a real state, not an error. The correct follow-up is to dispatch
  `code-reviewer` against the delivery diff and record the trail entry. Do not skip the review
  or backdoor a "reviewed" claim without running the actual reviewer.
- **ABANDONED** plans found with Oracle 2 evidence of shipped code should surface as a concern
  in the audit output — the plan may need a frontmatter flip to `superseded` with a
  `superseded_by:` pointer.

## Output format

Emit a markdown table — one row per audited plan:

```markdown
| Plan | Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket |
|------|-----------------|-----------------|-------------------|--------|
| `docs/plans/YYYY-MM-DD-name.md` | `status: X`; AC count | N/M ACs pass; misses if any | trail record `A..B` covers commits / UNCOVERED | **BUCKET** |
```

Follow the table with:

1. **Summary** — count per bucket.
2. **Catch-up review queue** — list every DELIVERED-UNREVIEWED plan with its delivery commits.
3. **Frontmatter flip recommendations** — list any ABANDONED plans with stale `status: draft`.
4. **Method notes** — flag any plan with unverifiable Oracle 2 (no typed-prefix ACs) or
   missing delivery commits.

Save to `tasks/audits/YYYY-MM-DD-plan-delivery-audit.md`. Create the `tasks/audits/` directory
if absent.

## Worked example — 2026-05-27 holodeck audit

> **the Staff Engineer F3 falsifiability hook.** The rows below are drawn directly from the holodeck
> audit at `X:\claude-unreal-holodeck\tasks\recovery\2026-05-27-plan-delivery-audit.md`.
> A reviewer can walk the decision tree against each row independently and verify the bucket
> assignment. Plans are in the holodeck repo (`X:\claude-unreal-holodeck\docs\plans\`).

**Plans audited (4 real plans; sidecar exclusion applied first):**

| Plan | Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket |
|------|-----------------|-----------------|-------------------|--------|
| `2026-05-24-patch-and-send-back-contribution-invite.md` | `status: implemented`; cites commits `d67f7371b`, `d368b6269`, `a92447c17`, `b5b824d47`; 10 ACs | 9/10 ACs verified at disk; AC3 (URL liveness) inherently manual — remaining 9 pass grep/cited checks | **COVERED (archive-aware).** `archive/review-trail/2026-05-21/2026-05-24-115430145738-08b5c444.json` — range `977a40b29..cc4c936d6`; all 4 delivery commits satisfy `is-ancestor C cc4c936d6 && ! is-ancestor C 977a40b29`. _A live-only Oracle 3 (`tasks/review-trail/` glob without `archive/review-trail/**`) would mis-classify this plan as DELIVERED-UNREVIEWED — see Key finding below._ | **DELIVERED+REVIEWED** (under archive-aware Oracle 3) — **would have been DELIVERED-UNREVIEWED** under live-only Oracle 3 |
| `2026-05-26-game-dev-ownership-and-bidirectional-install-drift.md` | `status: implemented`; 9 delivery commits; 13 ACs; `reviewed: patrik 2026-05-26; code-reviewer 2026-05-26` | 12/13 ACs fully in-repo; AC7 realized as external coordinator dependency (landed) | **COVERED.** Live record `tasks/review-trail/2026-05-26-131032300171-1afa35ae.json` — range `609399fcc..HEAD`; all 9 delivery commits fall inside range | **DELIVERED+REVIEWED** |
| `2026-05-26-headless-extractor-seam-buildout.md` | `status: draft`; active workstream with recovery handoff `2026-05-27_084305_e40956a9.md` ("review-complete, ZERO C++ authored; resume by dispatching Phase 1 H-2 resolver") | Not run — no terminal close; Oracle 2 does not apply to in-flight work | Not applicable — delivery not claimed | **IN-FLIGHT** |
| `2026-05-19-headless-extraction-buildout.md` | `status: draft`; `kind: roadmap-lite`; explicitly superseded by the 05-26 seam plan | Not run — no delivery expected | Not applicable — abandoned by supersession | **ABANDONED** |

**On DELIVERED-UNREVIEWED and PARTIAL buckets:** the worked example's 4 plans landed cleanly in DELIVERED+REVIEWED / IN-FLIGHT / ABANDONED in their resolved state — but the 05-24 row is annotated above with what it would have been under a live-only Oracle 3 (`DELIVERED-UNREVIEWED`). That's the falsifying demonstration of why archive-aware reading is load-bearing: the same plan, audited the same day, resolves to a different bucket depending on whether Oracle 3 globs the archive. PARTIAL is exercised when Oracle 1 claims `implemented` but Oracle 2 finds cited file/symbol absent or AC tests red on disk — none of the 4 audited plans hit that path; the bucket is reached via the decision tree's Oracle 2 branch when typed-prefix tests fail.

**Decision tree walk for each row:**

1. **patch-and-send-back (05-24):** Oracle 1 = `implemented` → proceed to Oracle 2. Oracle 2 = 9/10 typed-prefix tests pass; AC3 unverifiable (URL liveness) but 9 others confirm → "all verifiable ACs pass". Proceed to Oracle 3. Oracle 3 = archived trail record covers all delivery commits via range-membership → COVERED. Bucket: **DELIVERED+REVIEWED**. ✓

2. **game-dev-ownership (05-26):** Oracle 1 = `implemented` → proceed to Oracle 2. Oracle 2 = 12/13 ACs in-repo (AC7 is an external dep, landed); all in-repo ACs pass grep/cited → "all verifiable ACs pass". Proceed to Oracle 3. Oracle 3 = live trail record covers all 9 delivery commits → COVERED. Bucket: **DELIVERED+REVIEWED**. ✓

3. **headless-extractor-seam (05-26):** Oracle 1 = `draft` → bucket resolves immediately at Oracle 1. No delivery claimed; active recovery handoff confirms in-flight status. Bucket: **IN-FLIGHT**. ✓ (Oracle 2 and 3 not needed.)

4. **headless-extraction-buildout (05-19):** Oracle 1 = `draft` with explicit supersession pointer → bucket resolves at Oracle 1. No delivery expected or claimed. Bucket: **ABANDONED**. ✓ (Audit recommends a frontmatter flip to `status: superseded` + `superseded_by:` for hygiene — the current `draft` falsely signals "resumable" to pickup candidates.)

**Key finding from this audit:** The handoff's alarm — *"only 4 review-trail records, most shipped work unreviewed"* — was **an archival artifact, not a coverage gap**. The `/workweek-complete` run on 2026-05-24 (commit `db151655e`) moved 22 review-trail records into `archive/review-trail/2026-05-21/`. A live-only read of `tasks/review-trail/` saw only the current week's 4 records and missed them. Oracle 3 reading ONLY the live dir would have mis-classified the 05-24 plan as **DELIVERED-UNREVIEWED** — a false verdict. The archive-aware read via `list-review-trail-records.sh` is load-bearing, not optional.

## Why a skill, not a one-shot

This shape recurs after every crash and every "did we actually ship what we think we shipped?"
moment — it has fired at least twice in the coordinator workstream and once explicitly in holodeck
(2026-05-27). Re-deriving the three-oracle protocol, the git range-membership formula, the
archive-aware glob, and the sidecar exclusion rule by hand each time is the cost this skill
exists to amortise.

Closest analogues in `skills/`: `bug-sweep` (sweep-and-classify), `architecture-audit`
(multi-oracle assessment), `validate` (typed-prefix test running). All three support skill-shape
for recurring structured procedures. A wiki-only home would re-derive the dispatch shape every
invocation and would not be greppable from `coordinator:plan` or `/session-end` as a routine option.

## Dispatch sequencing

The skill orchestrates its own work in three phases:

**Phase 1 — Gather and filter (~2 min, EM does this directly)**

1. Glob the plan set: `docs/plans/*.md` (or the caller-supplied glob).
2. Filter out sidecars (any filename containing `.prior-art-check`, `.coverage-check`,
   `.docs-check`, `.review-patrik`, `.review-`, or `.check.`).
3. For each remaining plan, read frontmatter `status:` to rough-sort into buckets:
   - `status: superseded / abandoned / cancelled` → ABANDONED (no further oracle work needed)
   - `status: draft / in-progress / reviewed` → IN-FLIGHT (no further oracle work needed)
   - `status: implemented / shipped` → needs Oracle 2 + 3 (add to work queue)

**Phase 2 — Oracle 2 + 3 on implemented plans (~10 min, parallel scouts)**

For each implemented plan, dispatch a read-only Sonnet scout with:

> "Run all typed-prefix ACs for `<plan-path>` against current HEAD. Report each AC as PASS / FAIL /
> UNVERIFIABLE. Do NOT modify files, commit, or push. Return results inline."

Concurrently (EM-side), run Oracle 3 for each plan:

```bash
list-review-trail-records.sh   # get all live + archived records
# for each record, read sha_range and test delivery commits for range membership:
  git merge-base --is-ancestor <commit> <end_sha>    # must succeed (exit 0)
! git merge-base --is-ancestor <commit> <start_sha>  # must succeed (exit 0) — note the leading !
```

**Phase 3 — Synthesise and write output (~3 min)**

Apply the bucket decision tree to each plan. Write the output table + summary to
`tasks/audits/YYYY-MM-DD-plan-delivery-audit.md`. Surface any DELIVERED-UNREVIEWED plans
to the PM with a `code-reviewer` dispatch recommendation — do not dispatch autonomously,
as the PM may choose to defer.