---
name: engagement-protocol
domain: meta
description: |
  [PROTOCOL] Canonical contract for agency engagement lifecycle: artefact schemas, state
  files, iteration budget, escalation rules, cross-domain handoff, observability.
  Single source of truth referenced by agency-intake, leads, directors,
  and pipeline skills. Pure reference — no triggers, not invoked by the
  secretary as a router target.
---

# engagement-protocol

This is the shared contract every agency role follows. The secretary captures intake, the lead orchestrates, the director accepts. All of them read from and write to the artefacts defined here.

## Engagement = a directory

Every agency engagement lives in an `engagement/` directory under the project working directory. Files in this directory are the engagement's state. Nothing about the engagement is conversational — if it matters, it's in a file.

### Whitelist (allowed paths only)

The list below is **closed**. No files outside it may be created inside `engagement/`. Existing files outside this list are a protocol violation and the director MUST treat them as red flag (likely phantom-claims hideout).

```
engagement/
├── criteria.md                 # secretary, semi-immutable (additions/removals locked; rephrasing allowed via lead/secretary, see "Mutability rules")
├── scope-sync.md               # director, optional, append-only after create
├── plan.md                     # lead, mutable until first dispatch, then frozen
├── specs/                      # dev only — user-spec.md / tech-spec.md / research-verdict.md
├── tasks/*.md                  # ALL domains — atomic task files (mandatory L; recommended M with ≥2 specialists; skip S)
├── tasks/INDEX.md              # optional manifest; mandatory if size: L
├── brand/                      # design only — voice / tokens / guidelines
├── design-system/              # design only — components / tokens
├── ui/                         # design only — screens / flows
├── executor-reports/           # specialists, one file per specialist, append-only
│   └── {specialist}.md
├── validation-log.md           # lead, append-only — every validator run
├── validation-outputs/         # MANDATORY — JSON proof-of-run for each validator AND adversary role (incl. preliminary)
│   └── {validator|adversary-role}-iter-{N}[-preliminary]-{timestamp}.json
├── consilium-summary.md        # auto-written by consilium-synth.py (M/L only); aggregates adversary findings
├── human-directive.md          # MANDATORY on M/L between consilium and director verdict; human as supreme judge
├── codex-outputs/              # OPTIONAL — assets generated by Codex via mcp__codex__codex tool calls
│   └── {NN}-{slug}.{png|svg|jpg}
├── iteration                   # plain-text counter file (lead inc on handoff, director inc on reject)
├── screens/                    # MANDATORY for ux_heavy=true — Playwright captures
│   └── {iteration}/{theme}/    # e.g. iter-1/dark/dashboard.png
├── traces/                     # MANDATORY for ux_heavy=true — exercised flow logs
│   └── {iteration}/{flow}.json # network/console/dom snapshots
├── deploy-log.md               # dev only when deploy boundary crossed
├── docs-diff.md                # docs pipeline only
├── handoff.md                  # lead, REPLACED per iteration
└── acceptance-log.md           # director, append-only
```

### Forbidden (do NOT create)

These names appear in past engagements as "supporting docs". They duplicate whitelist content, hide phantom claims, and bloat the bookkeeping. Forbidden:

- `preview.md`, `compliance.md`, `review-log.md`, `visual-review.md`, `summary.md`, `notes.md`, `report.md`, `tldr.md`
- `rework-N-brief.md`, `iter-N-summary.md`, `wave-N-recap.md`, anything resembling a per-iteration sidecar
- `findings.md`, `issues.md`, `audit.md` outside `validation-log.md`

If you feel pressure to create one of these — the content already belongs in an existing whitelist file:
- "compliance" / "audit" / "review" → `validation-log.md`
- "preview" / "summary" / "report" → `handoff.md` (it IS the summary)
- "rework brief" → new `## Iteration N` section inside `validation-log.md` and rewritten `handoff.md`
- "exercised flow" / "smoke run" → `traces/` directory + reference inside `handoff.md` §"Exercised"

Director's red-flag scan (per iteration): `ls engagement/` — any file outside whitelist = REJECT with reason "out-of-whitelist artefact, content must move into canonical file".

## Artefact schemas

### `criteria.md` mutability rules (resolves the "immutable vs lead-sharpening" tension)

`criteria.md` is **semi-immutable**:

| Action | Allowed by | When | Audit |
|---|---|---|---|
| **Adding** a new "Done when" / "Deliverables" bullet | user only | scope-sync with director | director records in `scope-sync.md` |
| **Removing** a bullet that has independent user value | user only | scope-sync with director | director records in `scope-sync.md` |
| **Changing measurement bar** (e.g. "tests green" → "tests + 90% coverage") | user only | scope-sync with director | director records in `scope-sync.md` |
| **Rephrasing for clarity** (no scope change) | lead OR secretary autonomously | Phase 1a sharpening, or loop-to-intake | scope-sync.md auto-edit log |
| **Dropping no-value filler** (would-always-be-zero metric, never-read deliverable) | lead OR secretary autonomously | Phase 1a sharpening | scope-sync.md auto-edit log |
| **Splitting compound bullet into atoms** | lead OR secretary autonomously | Phase 1a sharpening | scope-sync.md auto-edit log |
| **Frontmatter `size`/`ux_heavy` promotion** (S→M, false→minor→true) | lead | mid-engagement when scope grows | scope-sync.md note |
| **Frontmatter demotion** | NEVER (would discard accumulated rigour) | — | — |

The directive resolves the apparent contradiction: criteria are immutable to **scope**, mutable for **clarity**. Lead does not need user touch to drop a "ошибки за период (always 0)" filler — that's not scope reduction, that's noise removal. Lead does need user touch to drop "тесты после миграции" — that has independent value.

### `criteria.md` (secretary)

```markdown
---
engagement: {engagement-name}
created: {YYYY-MM-DD}
domain: marketing | dev | design
size: S | M | L                                       # see Engagement size tier
ux_heavy: false | minor | true                        # see UX-heavy engagements
tools_required: [docker, playwright, postgres, ...]   # see Pre-flight check
# protocol_version: 4                                 # OPTIONAL — kept for future migration story; not required today
---

# Acceptance criteria — {engagement-name} — {YYYY-MM-DD}

## Scope
{one paragraph, verbatim-aligned with user brief}

## Deliverables expected
- {deliverable 1 with measurable bar}
- {deliverable 2 with measurable bar}

## Done when
- {observable condition 1}
- {observable condition 2}

## Explicitly out of scope
- {thing excluded with one-line justification}

## Review mode
`lean` | `full` | `solo`

## Iteration budget
Guidance: 2 rework cycles, then user escalation. Counter is informational — escalate immediately on repeating critique. Hard cap 4.

## Cross-domain dependency (optional)
- Primary domain: {marketing | dev | design}
- Secondary domain: {marketing | dev | design} — invoked at phase {N} by primary lead
```

### Pre-flight check (intake-time, blocking)

Before secretary hands off to lead, verify every tool in `tools_required` is reachable:

| Tool | Check | Action if missing |
|---|---|---|
| `docker` | `docker info` exits 0 | Ask user to start Docker Desktop. Block until done. |
| `playwright` | playwright binary present (project or `npx playwright --version`) | Ask user / lead to install. Block until done. |
| `postgres` / `db` | connection string in `.env` and `pg_isready` (or equivalent) succeeds | Ask user to start DB. Block. |
| `node`, `python`, `bun`, etc. | `--version` succeeds | Ask user to install. Block. |
| Secrets (e.g. `OPENAI_API_KEY`) | env var or `.env` entry present (presence only — don't read value) | Ask user to add. Block. |

Pre-flight failure → secretary records the blocking tool in `criteria.md` "out of scope" line "BLOCKED at intake until {tool} available", and pauses engagement until user resolves. Do NOT hand off to lead with known-broken validation environment — that is the root cause of CONDITIONAL accepts that never resolved.

### `scope-sync.md` (director, optional)

```markdown
# Scope sync — {engagement-name} — {YYYY-MM-DD HH:MM}

## Criteria frozen as of
{timestamp}

## Ambiguities resolved
- Q: {question raised by director or lead}
- A: {resolution}

## Criteria locked by
{director-name}
```

### `plan.md` (lead)

```markdown
# Engagement plan — {engagement-name}

## Shape
{engagement-type classification + rationale}

## Phases
1. Phase name — owner (mid-lead / specialist) — deliverable — dependencies
2. ...

## Validators planned
- {validator-name} on {artefact} — triggered by {rule}

## Cross-domain handoff (if applicable)
- Trigger: {condition}
- Secondary lead: {lead-agent-name}
- Artefact passed: {path}
```

### `validation-log.md` (lead, append-only)

Schema defined in `validation-pipeline`. Append one section per validator run.

### `handoff.md` (lead, replaced per iteration)

The first section is ALWAYS the diff. The rest is supporting evidence. If a director reads only section 1, they should already know roughly what happened — prose is for context, not for the primary source of truth.

```markdown
# Handoff — iteration {N} — {engagement-name} — {YYYY-MM-DD HH:MM}

## 1. Diff summary (PRIMARY)
For dev engagements, paste raw output:
- `git diff --stat {base}..HEAD` (file-level changes)
- `git log --oneline {base}..HEAD` (commit list)

For marketing/design engagements (no git diff available):
- Created files: list with sizes (`wc -l` or byte count)
- Modified files: before/after delta (lines added/removed or visual diff)
- Removed files: list with reason

This section is the **source of truth** for what changed. All other sections describe WHY and HOW WELL — they do not redefine what changed.

## 2. Deliverable manifest
- {artefact path 1} — {short description}
- {artefact path 2} — ...

## 3. Criteria trace
| # | Criterion | Status | Evidence (path + line/section) |
|---|---|---|---|
| 1 | {criterion text} | ✅ / ⚠️ / ❌ | {pointer} |

## 4. Executor reports
- {specialist-name} — `executor-reports/{specialist}.md`

### 4-pre. Specialist contract enforcement (lead duty)

Before composing §4 and §4a, lead verifies every dispatched specialist wrote to the canonical place:

```bash
# For each specialist dispatched:
ls engagement/executor-reports/{specialist-name}.md     # must exist
ls engagement/                                          # no rogue files
```

If a specialist:
- Wrote nothing → re-dispatch with reminder, do NOT pretend they reported.
- Wrote to wrong path (`/tmp/`, `engagement/notes.md`, project-root `report.md`) → re-dispatch + delete wrong file.
- Created a rogue file in `engagement/` (e.g. `findings-frontend.md`) → consolidate into their canonical executor-report, delete rogue.

This enforcement is the lead's job, not the director's. Director treats `engagement/` as authoritative — anything outside whitelist locations doesn't exist for acceptance purposes (and is a REJECT trigger).

### 4-iter. Iteration structure inside executor-reports

When the same specialist is re-dispatched on iter N+1 to address director feedback, they MUST append to their existing executor-report — NOT overwrite it. Each iteration in the report is structurally separated:

```markdown
# {specialist-name} executor report

## Iteration 1 — 2026-05-06 14:30
### Criteria acknowledgement
- Addresses crit-1 (foo.py exports add)

### What was done
- Implemented add() at src/foo.py L4
- Added unit test in tests/test_foo.py L12-L20

### Cross-contract claims
- Returns int, not None on edge cases (frontend should expect int)

### Anti-pattern self-disclosure
- None

## Iteration 2 — 2026-05-07 11:15
### What changed since iter-1
- Director rejected: edge-case test for negative inputs missing.
- Added negative-input test at tests/test_foo.py L22-L30.

### Anti-pattern self-disclosure
- None
```

Lead enforces the format on dispatch: include in Task prompt explicit reminder to append `## Iteration {N}` heading. Without iteration headings on iter ≥ 2, handoff-precheck flags `executor-report-iteration-missing` as REJECT trigger — earlier work would be silently overwritten and audit trail lost.

### 4a. Cross-validation table (REQUIRED if ≥2 specialists touched shared contracts)

Free-form "all match" claims are not accepted — director cannot verify them. Use this table with verbatim quotes from each specialist's executor report; without the table the cross-validation is presumed unrun and director treats it as such (REJECT).

| Contract | Specialist A — claim (verbatim) | Specialist B — claim (verbatim) | Verdict |
|---|---|---|---|
| `POST /api/items` | dev-backend-engineer.md L42: "returns 201 + {id, name}" | dev-frontend-engineer.md L18: "expects 201, body {id, name}" | MATCH |
| `User.email` field type | dev-backend-engineer.md L67: "VARCHAR(254) NOT NULL" | dev-tech-architect.md L23: "string, max 320" | DIVERGE |

DIVERGE = blocker. Resolve (re-dispatch the wrong specialist for fix, re-run code-reviewer) BEFORE handoff. Submitting with unresolved DIVERGE = REJECT.

If no shared contracts in this engagement, write: "N/A — no specialists overlap on shared contracts (only single-domain artefacts produced)." This explicit N/A is required — silence is not an answer.

## 5. Validation log
Summary — see `validation-log.md` for full entries.
- {validator}: {verdict}, {N findings resolved / N deferred}

## 6. Exercised (MANDATORY for ux_heavy, optional otherwise)
Narrative of what happened when the lead actually used each touched control. Each bullet must reference a verifiable artefact (Playwright trace, network log, screenshot, DOM snapshot path) — prose alone is reject-worthy because it can be hallucinated.

Format per bullet:
- Action → observed result → evidence path
- Example: "Clicked 'Квартал' preset at 14:32 → date popover set 2026-04-01 → 2026-04-24 (24 days). Evidence: `traces/iter-2/quarter-preset.json` line 45-52, `screens/iter-2/dark/dashboard-quarter.png`."

If section absent on a `ux_heavy: true` engagement → handoff INCOMPLETE (returned unread, no iteration burned).

## 7. Self-acceptance rehearsal (MANDATORY — structural gate, not prose)
Lead simulates director's acceptance sweep before handoff. This section is REQUIRED to be non-empty in a structured way:

### 7a. Mechanical pre-check (paste verbatim JSON output)

```
$ python ~/.claude/scripts/handoff-precheck.py engagement/ --json
{
  "engagement": "...",
  "fail_count": 0,
  "skip_count": 0,
  "status": "pass",
  "checks": [...]
}
```

`status` MUST be `"pass"`. If any check failed, STOP — fix the underlying issue and re-run. Do NOT submit handoff with a red exit code; the director's first action will be running the same script and seeing the same red.

### 7b. Honest concerns list (concerns must bind to criteria)

The point of this section is to catch the LLM-honesty hole: if you write "no concerns" you are signalling either dishonesty or absence of thought. Force yourself to find real concerns — and tag each one with the criterion it relates to, so vague "naming could be better" filler can be detected:

Format:
```
1. [crit-{N} | non-criteria | scope-creep] {concern}: {what I'd push back on if I were the director} — {why I'm submitting anyway: in-scope deferral / pre-discussed waiver / honest gap disclosed in §11 deferrals}
2. [crit-{N} | non-criteria | scope-creep] {second concern, real one}
3. (optional more)
```

Tag meanings:
- `crit-{N}` — concern about how this engagement covers criterion N (the strongest tag, hardest to fake).
- `non-criteria` — concern about something orthogonal to criteria.md (allowed but capped: at most 1 of N).
- `scope-creep` — concern that something done isn't strictly required by criteria (warning of over-engineering).

Threshold by engagement size:
- Size **S**: ≥1 concern, at least one tagged `crit-{N}` OR `scope-creep`.
- Size **M**: ≥2 concerns, at least one tagged `crit-{N}`.
- Size **L**: ≥2 concerns, plus delta from prior iteration ("what's new vs iter-{N-1}").

A rehearsal section that fails the tag-distribution check is **structurally invalid** — director rejects with `self-acceptance concerns ducked criteria.md, bad-faith pattern`. The tagging makes "naming could be better" / "docstring short" filler harder, because those are non-criteria and only one is allowed.

### 7c. ux_heavy auto-promote check

If your work touched any of `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.html`, or styling (`*.css`, `*.scss`, Tailwind classes in JSX), AND `criteria.md` has `ux_heavy: false`:
- Either ux_heavy was misclassified at intake → state explicitly "AUTO-PROMOTE: ux_heavy → true (UI files touched in diff §1)" and produce screens/traces accordingly.
- OR your work is genuinely non-UX (tooling on UI infra without visible change) → state explicitly why ux_heavy stays false.

Don't quietly skip screens just because intake said `ux_heavy: false`. This auto-promote keeps a misclassification at intake from disabling all UX defences downstream.

## 8. Deploy log
- See `deploy-log.md` OR "N/A — no deploy in this engagement"

## 9. Docs diff
- See `docs-diff.md` OR "N/A — no docs change"

## 10. Iteration counter
Current round: {N}. Prior round(s): {1, 2, ...}. (informational — escalation triggered by repeating critique, not by counter)

**Source of truth:** `engagement/iteration` plain-text file (single integer). Lead increments it before submitting handoff; director increments on REJECT. handoff-precheck verifies the counter agrees with `## Iteration N` headings in acceptance-log.md. Mismatch = REJECT (someone falsified the count).

## 11. Known deferrals
- {deferred item}: justified by {criteria.md out-of-scope | explicit user waiver | director-approved scope note}
```

### `acceptance-log.md` (director, append-only)

Schema defined in `director-acceptance-protocol`. One section per iteration.

## Protocol versioning

`protocol_version` in `criteria.md` frontmatter pins the engagement to the protocol version it was created under. handoff-precheck reads this field and:

- If version equals current (4) — proceed normally.
- If version is older than `MIN_SUPPORTED` (currently 3) — REJECT with `engagement uses obsolete protocol; run engagement-migrate.py`.
- If version is older but ≥ `MIN_SUPPORTED` — WARN; engagement continues with old rules where they differ. Migration recommended at next safe break.
- If version is newer than current script — REJECT; scripts are out of date.

Current version log:

| Version | Introduced | Major changes |
|---|---|---|
| 1 | 2026-04-20 | initial agency model |
| 2 | 2026-04-24 | post-Wave-2 fixes (binary verdict, whitelist, ux_heavy, screens/traces) |
| 3 | 2026-05-05 | machine-checked gates (preflight, paths-check, danger-scan, validator-outputs) |
| 4 | 2026-05-06 | engagement size tier (S/M/L), ux_heavy gradient, structured trace schema, abort workflow, cross-domain secondary, protocol versioning, error budget gradient |

Version bumps happen when handoff-schema sections change OR new mandatory sub-checks are added. Backward compatibility is preserved as long as old engagements declare their version explicitly.

## Engagement size tier (S / M / L)

Different engagements need different rigour. Forcing a 30-line CSS tweak through the same 11-section handoff as a multi-wave refactor invites corner-cutting. Size is set by the secretary at intake and recorded in `criteria.md` frontmatter as `size: S | M | L`.

### Sizing rules (mechanical)

| Tier | Diff size | Specialists | UI scope | Deploy |
|---|---|---|---|---|
| **S** | ≤2 changed files OR ≤50 LOC | 1 | none / `ux_heavy: false` | none |
| **M** | 3–10 files OR 50–500 LOC | 2–3 | `ux_heavy: false` or `minor` | preview/staging only |
| **L** | >10 files OR >500 LOC, OR multi-wave | ≥4, OR cross-domain | `ux_heavy: true`, OR new design system | production deploy |

If the engagement crosses thresholds during execution, lead promotes the size in `scope-sync.md` (S→M→L only, never demote). Promotion = additional rigour applies retroactively from next handoff.

### Schema relaxations by tier

| Section / gate | S | M | L |
|---|---|---|---|
| Handoff §1 Diff summary | **single line OK** ("+/- N files") | required (full git stat) | required (full git stat + commits) |
| Handoff §2 Deliverable manifest | required | required | required |
| Handoff §3 Criteria trace | **inline** (1 line per crit, no table needed) | required (table) | required (table) |
| Handoff §4 Executor reports | merged into §2 if 1 specialist | required | required |
| §4a Cross-validation table | N/A (1 specialist) | required if ≥2 specialists touch shared contract | required |
| Handoff §5 Validation log | brief summary line OK | full | full |
| Validator output JSON files | required | required | required |
| Handoff §6 Exercised | only if `ux_heavy: minor \| true` | only if `ux_heavy: minor \| true` | required if `ux_heavy: minor \| true` |
| Handoff §7 Self-acceptance | **abbreviated: ≥1 concern** | full: ≥2 concerns | full: ≥2 concerns + delta-from-iter-N-1 |
| Handoff §8 Deploy log | N/A | only if deploy crossed | required |
| Handoff §9 Docs diff | optional | mandatory if `src/` touched | mandatory if `src/` touched |
| Handoff §10 Iteration counter | optional (informational) | optional | optional |
| Handoff §11 Known deferrals | optional | required (or "None") | required (or "None") |
| Director scope sync | optional | optional | mandatory if `ux_heavy: true` or weak criteria |
| Director phase | **NONE** (producer self-attest + mechanical + human) | lightweight (judge between producer + 1 adversary) | full (judge between producer + 5-reviewer consilium) |
| Adversary pass | none | Opus adversary in fresh subprocess (`adversary.py --consilium M`) | Consilium: peer-Opus + 2× Codex + Sonnet + Haiku (`adversary.py --consilium L`) |
| Iteration budget | 1 (one shot expected) | 2 | 3 |
| Iteration budget on auto-promote | n/a | +1 (so promoted-S becomes M with 3 max) | +1 (so promoted-M becomes L with 4 max) |

### S-tier minimum-viable handoff template

For `size: S` engagements, this 5-line shape is acceptable as `handoff.md`:

```markdown
# Handoff — iter 1 — {engagement-name}

## 1. Diff summary
+/- {N} files: {paths}; {one-line of what changed}

## 2. Deliverables + criteria trace
- crit-1: {bullet text} ✅ {evidence path}
- crit-2: {bullet text} ✅ {evidence path}

## 5. Validation log
{validator}: {verdict} — see validation-outputs/

## 7. Self-acceptance
1. [crit-N|scope-creep] {one honest concern} — {justification}
```

That's it. No §3 table when 2-3 criteria fit inline. No §4 separate reports section when one specialist's work is in §2 already. No §6 unless UI. No §11 if there are no deferrals. The point: small engagement = small handoff. Director still runs `handoff-precheck.py` — schema-aware checks know about `size: S` and don't demand the larger-tier sections.
| `tasks/*.md` atomic decomposition | N/A (skip) | recommended if ≥2 specialists OR ≥3 distinct deliverables | required (`tasks/INDEX.md` mandatory) |

Tier-relaxations are the ONLY allowed deviation from the canonical schema. Any other deviation = whitelist violation = REJECT. The point: small engagements feel small, large engagements stay rigorous, and corner-cutting goes from individual lead choice to protocol-recognised path.

### Task decomposition rule (cross-domain)

`tasks/*.md` files are atomic deliverable units that let lead re-dispatch a single broken atom on iter-2 instead of re-running the whole phase. Each domain has its own decomposition methodology:

- **dev** — `task-decomposition` (tech-spec → atomic tasks with TDD anchors). Existing.
- **marketing** — `marketing-task-decomposition` (criteria + plan → atomic tasks: keyword-cluster, ad-group, landing-block, metric-pull, banner-variant, etc.). New.
- **design** — `design-task-decomposition` (criteria + plan → atomic tasks: voice-axis, logo-direction, token-group, component-spec, screen-variant, etc.). New.

Authority and timing:
- Lead invokes the matching skill at **Phase 2.5** (after `plan.md` is frozen, before any specialist dispatch).
- Skip permitted only on size: S engagements OR size: M with a single specialist.
- Director will REJECT a size: L handoff with no `tasks/` directory or with empty `tasks/INDEX.md`. Same gate as missing `validation-log.md`.

On REJECT loop: lead identifies failing `crit-N` from director verdict, finds tasks where `crit_refs` includes those crit values, re-dispatches ONLY those tasks. Specialist appends `## Iteration {N}` to their executor-report (per §4-iter rules). The whole-phase re-run is the failure mode this exists to prevent.

### How `handoff-precheck.py` adapts

The script reads `size` from `criteria.md` frontmatter and skips checks not applicable to that tier (e.g. for `size: S` with 1 specialist, `cross-val` table check is automatically skipped — not because of bypass, but because the schema declares it N/A at this size). Same logic gates the `tasks/` requirement: the precheck tolerates an empty `tasks/` on S, warns on M without it, and FAILs on L without it.

## Iteration budget

The counter is informational, not a quota. Do not use language like "slot 1/2 used" or "last attempt" — it creates psychological pressure to accept marginal work. Escalation triggers are **root-cause based**, not slot-based:

- Iteration 1 — lead's first handoff. Director writes ACCEPT or REJECT (binary, see "Verdict is binary" below).
- Iteration 2 — lead's revision. Director reviews again.
- **Repeating-critique trigger** (highest priority): if the same blocking item from iteration N appears again in iteration N+1 — escalate to user IMMEDIATELY, regardless of which iteration this is. The loop is the signal, not the count.
- **Pre-final-iteration trigger**: before starting the last allowed iteration, escalate to user with current blockers. Wait for explicit "продолжаем" or "пересматриваем scope".
- **Hard limit by tier** (tiered acceptance refactor):
  - S: 1 iteration. REJECT → human directive (rework / abandon), no auto-loop.
  - M: 2 iterations. After round 2, escalate.
  - L: 3 iterations. After round 3, escalate.
  - Auto-promoted engagements: budget +1 (promoted-S becomes M with 3 max; promoted-M becomes L with 4 max).
- Never start an iteration beyond the hard limit without user authorisation.

### Mechanical loop-to-intake trigger

"Criteria are wrong, not the work" is a real failure mode but agents miss it without an explicit rule. Trigger:

If `acceptance-log.md` shows iter-N REJECT and iter-(N+1) REJECT with the same blocker (matched by criterion ID OR by reject-reason text-similarity ≥ 70%), the lead is OBLIGATED to:

1. STOP rework. Do not start iter-(N+2) on this criterion.
2. Dispatch `agency-intake` via Task tool with payload:
   - Path to `engagement/criteria.md`.
   - Path to `engagement/acceptance-log.md`.
   - The repeating blocker text.
   - Lead's diagnosis: "criterion `{id}` is unbuildable as written because {reason}; suggest revision to {proposed-rewording}".
3. Wait for secretary to either:
   - Edit `criteria.md` (touching only the broken criterion, keeping rest intact, recording change in `scope-sync.md` with director's signature) and unblock.
   - Escalate to user: "criterion {id} cannot be met as stated, revise scope or close engagement".
4. Resume from current iteration with revised criteria.

If lead skips this and just keeps reworking → director rejects with `loop-to-intake skipped — repeating blocker not escalated`. The mechanical trigger (same blocker twice) is the signal, regardless of how confident the lead feels about a fix.

### Secretary's authority during loop-to-intake

When dispatched mid-engagement to fix criteria, secretary has authority for **non-scope edits** without user touch (matching lead's Phase 1a sharpening rules):

- Rephrasing a vague bullet to a measurable one (no value change).
- Dropping a redundant / no-value filler bullet that the lead's experience reveals.
- Splitting a compound bullet into testable atoms.

These are recorded in `scope-sync.md` as "secretary auto-edit during loop-to-intake" with diff. User sees them at ACCEPT.

User touch required only when:
- The criterion is *unbuildable for principled reasons* (not just "lead struggled twice").
- Removing a deliverable that has independent user value.
- Major scope change that shifts the engagement to a different domain.

In those cases, secretary escalates with the standard Russian template:

```
Critеrии для engagement {name} нуждаются в существенной правке. {N} итерация(и) rework не закрыли блокер:
- {blocker text}

Предложение: {proposed-rewording or scope reduction}.
Альтернатива: закрыть engagement как unresolvable.

Выбери: подтвердить правку / переформулировать / закрыть engagement.
```

When the loop is "criteria are wrong, not the work", route back to `agency-intake` for new/updated `criteria.md` instead of throwing more rework at the lead.

## Verdict is binary

The director's verdict is exactly **ACCEPT** or **REJECT**. Conditional accepts are forbidden — they defeat the agency model's purpose of reducing user QA burden:

- No `ACCEPT CONDITIONAL`, no `ACCEPT pending X`, no `ACCEPT with TODO`, no `ACCEPT — user to verify Y`.
- If any acceptance bar cannot be verified by the director with available tools (Docker not running, Playwright not installed, BD not reachable, secrets missing): the verdict is **REJECT** with reason `validation incomplete: <what was unverifiable>`. Do NOT defer the verification to the user.
- Tooling unavailability is itself a blocker. The lead and the director coordinate to make the validation environment work (escalate to user once if a manual step is required, e.g. "запусти Docker Desktop"), then continue. They do NOT submit work for user-side validation.

The only acceptable form of "deferral" is something already listed in `criteria.md` "Explicitly out of scope" or in a director-approved `scope-sync.md` waiver.

## UX-heavy engagements

`ux_heavy` is a 3-level gradient, not boolean — small UI tweaks should not pay full L-engagement overhead:

| Value | When | Mandatory artefacts |
|---|---|---|
| `false` | Backend, brand voice, SEO, copy without UI surface, infra | none |
| `minor` | CSS tweak, copy edit on existing surface, padding/color/font change, single-state visual fix | ONE screen per touched surface, single theme; NO traces required |
| `true` | New UI surface, new interactive flow, behaviour-changing UI, multi-screen design | Screens both themes + traces with structured `verdict` field per flow |

Set by secretary at intake based on signals (visual hierarchy / layout / typography / color / spacing / words of taste / mockup references). Lead can promote during execution (`false` → `minor` → `true`) with `scope-sync.md` entry; never demote.

### `ux_heavy: minor` — relaxed rules

For a small CSS-class adjustment or copy tweak, full screens-light+dark + traces is overkill and pushes leads toward corner-cutting. `minor` keeps the visible-state guarantee without overhead:

- ONE screenshot per touched surface in `engagement/screens/{iteration}/{theme}/{surface}.png`. Single theme acceptable (whichever was visually changed); both themes only if both were modified.
- `traces/` is **optional** — required only if the change has interactive behaviour (rare for CSS).
- `handoff.md` §6 Exercised is required but each bullet may reference just a screen, no trace.
- `ux-review` validator runs in lightweight mode (skips trace verification).

### `ux_heavy: true` — full rules

1. `engagement/screens/{iteration}/{theme}/` MUST contain Playwright captures of every touched UI surface, BOTH light and dark themes (if dark mode exists in the project).
2. `engagement/traces/{iteration}/{flow}.json` MUST contain structured exercised-flow records (schema below) for every exercised flow listed in handoff §6.
3. `handoff.md` §6 "Exercised" is mandatory and must reference real paths in `screens/` and `traces/`.
4. `ux-review` validator MUST be in lead's validation-log before handoff (mandatory per `director-acceptance-protocol`'s validator selection on `ux_heavy: true`). On L-tier, director MAY request a single re-run of `ux-review` on screens + traces if adversary findings identify a coverage gap (per `director-acceptance-protocol §"When you MAY request specific validator re-run"`).
5. Docker / Playwright unavailability is a blocker, never a deferral. Lead escalates to user once ("start Docker Desktop"), then proceeds. `screens/` cannot be left for the user to capture.

If `ux_heavy: true` and any of (1)–(3) absent on submission → handoff is INCOMPLETE (returned unread, doesn't burn iteration budget).

### Trace JSON schema (mandatory structured fields)

Free-form trace dumps let the lead write "result was correct" without comparing claim to reality. Forcing structure makes the lead compare numerically before submission:

```json
{
  "flow": "quarter-preset-click",
  "iteration": 2,
  "captured_at": "2026-04-24T14:32:11Z",
  "steps": [
    {
      "action": "click",
      "selector": "[data-test=preset-quarter]",
      "expected": "period.endDate - period.startDate >= 80 days (rolling 90-day window)",
      "observed": {
        "period.startDate": "2026-04-01",
        "period.endDate": "2026-04-24",
        "diff_days": 24
      },
      "verdict": "FAIL",
      "notes": "Math.floor(month/3) returns calendar quarter start, degrades to 1-3 days at month boundaries"
    }
  ],
  "network": [{"url": "...", "status": 200, "request_body": "...", "response_body": "..."}],
  "console": [{"level": "warn", "message": "..."}],
  "dom_snapshot": "engagement/traces/iter-2/quarter-preset.html"
}
```

Required fields per step:
- `action` — what user did (click / type / select / navigate).
- `selector` — DOM selector or trace anchor.
- `expected` — what SHOULD happen, written from criteria.md or user-spec language. Plain English allowed but must be falsifiable.
- `observed` — what actually happened, structured as object with concrete values.
- `verdict` — `"PASS"` or `"FAIL"`. Lead computes this themselves.

Lead-side gate: if any step has `"verdict": "FAIL"`, the lead does NOT submit handoff with this trace as evidence. Either fix the underlying behaviour, or surface as a known-issue deferral with explicit out-of-scope justification in §11. Submitting with FAIL verdicts attached = REJECT `submitted FAIL trace as evidence`.

Director-side gate: ux-review validator and director both look at `verdict` field. Missing `verdict` field → REJECT `unstructured trace cannot be verified`. Free-form prose traces that don't parse as JSON → REJECT.

## Dangerous operations registry

Some operations have heavy / irreversible consequences (data loss, history rewrite, production damage, infra teardown). Validators check correctness, not authorisation. Authorisation is a separate axis: the user must explicitly OK these ops, even if they are technically clean.

### Operations requiring explicit user OK (always)

| Class | Examples | Why user-OK required |
|---|---|---|
| Schema destruction | `DROP TABLE`, `DROP COLUMN`, `TRUNCATE`, lossy `ALTER COLUMN TYPE` | Data lost cannot be reconstituted from code. |
| Bulk row deletion | `DELETE FROM x` without `WHERE`, deletes affecting >25 files | Reversal needs backup recovery. |
| History rewrite | `git push --force`, `git reset --hard origin/...` | Teammates' work / audit trail destroyed. |
| Filesystem recursive delete | `rm -rf` on parents, `~`, `$HOME`, root paths | Path mistakes are catastrophic. |
| Migration without rollback | `down()` empty / `raise NotImplementedError` | ACCEPT becomes irreversible. |
| Production deploy | `vercel --prod`, `fly deploy`, `gh release create`, manual `kubectl apply -f prod` | Live impact, customer-visible. |
| Infrastructure teardown | `terraform destroy`, `kubectl delete -n prod` | State loss + customer impact. |
| Secret rotation/deletion | `rotate-secret`, `delete-secret`, `gh secret remove` | Downstream consumers may break silently. |
| Public publish | `npm publish`, `pip upload` (real index, not test) | Cannot fully unpublish. |

### Detection

Lead runs `~/.claude/scripts/danger-scan.py --engagement engagement/` (or `--diff-file ...`) before handoff. The script covers the patterns above plus bulk-delete heuristic. JSON output lists every match with severity (critical/high) and rationale.

`handoff-precheck.py` calls `danger-scan.py` automatically as one of its sub-checks. Non-empty findings → handoff blocked until either (a) each finding has a paired user-OK entry in `scope-sync.md`, OR (b) the operation is removed from the diff.

### User-OK protocol

When `danger-scan` finds a dangerous op, the lead surfaces it to the user via director-mediated escalation BEFORE merging / applying:

```
Опасная операция в этом engagement:
- {operation kind}: {snippet}
- Последствия: {msg from danger-scan}

Подтверждаешь? (y / n / разъяснить)
```

Lead waits for explicit "y" / "да" / "ок" — silence is NOT consent. On "y": lead writes a confirmation entry in `scope-sync.md`:

```
## Dangerous-op user OK — {YYYY-MM-DD HH:MM}
- {operation id}: {snippet} — user OK on {timestamp} via {user-message reference}
- Backup verified before apply: {yes/no/N-A — backup not applicable for this op}
```

Without this entry, director rejects with `dangerous operation submitted without user-OK in scope-sync.md`.

### Backups

For schema destruction and bulk delete: lead must verify a recent backup exists BEFORE applying. Mechanical check goes in `scope-sync.md` "Backup verified" line: timestamp + backup location. No backup → operation cannot proceed even with user OK; lead asks user to create backup first.

### What does NOT count as dangerous

- Adding columns / tables / indexes (forward-compatible).
- Test fixture changes.
- Editing source files in any quantity (revertible by git).
- Local `rm` of build artefacts inside project dir (`dist/`, `node_modules/`).
- Deploys to staging / preview environments.
- Engagement archival (`mv engagement engagement-archived/...`).

These do not require user touch and stay autonomous.

## Engagement abort (user pulls the plug mid-engagement)

When the user says "стоп / забей / закрой" mid-engagement (any iteration, even before iter 1 handoff):

1. The role currently holding the floor (lead, secretary, or director) writes a stub `acceptance-log.md`:

   ```markdown
   ## Iteration {current} — {YYYY-MM-DD HH:MM}

   ### Verdict: ABORTED

   Reason: user requested abort.
   User message reference: "{verbatim quote of user's stop directive}"

   State at abort:
   - Iteration: {N}
   - Last director verdict (if any): {ACCEPT / REJECT / none}
   - Specialists dispatched: {list}
   - Artefacts produced: {list}
   ```

2. Run archival with abort flag:

   ```bash
   python ~/.claude/scripts/engagement-archive.py --reason aborted
   ```

   This moves `engagement/` → `engagement-archived/{date}-{name}-aborted/primary/`. Sanity checks (criteria.md exists, ACCEPT verdict present) are skipped under `--reason aborted`.

3. Print one user-facing line confirming archive:

   ```
   Engagement {name} прерван и заархивирован: engagement-archived/{date}-{name}-aborted/
   ```

4. No further iteration/handoff/director action. Engagement is closed.

### When NOT to abort

If user says "не уверен / подожди / давай по-другому" — that is NOT an abort. That's a scope clarification, route to:
- secretary if criteria are wrong (loop-to-intake protocol)
- lead if approach is wrong (request fresh dispatch with new constraints)

Abort is reserved for explicit termination: "стоп", "забей", "закрой engagement", "не делай это", "отменяю".

## Engagement archival (after ACCEPT)

After the director writes ACCEPT in `acceptance-log.md`, the engagement directory is archived to free the slot for the next engagement. Without archival, leftover artefacts pollute whitelist scans and create cross-engagement confusion (old screens look like current ones, stale plan.md misleads new lead).

Director performs archival via the canonical script — never by hand-rolling `mv`. Script handles name extraction, collision-suffix, sanity checks (criteria.md exists + ACCEPT verdict present):

```bash
python ~/.claude/scripts/engagement-archive.py
```

Order matters: the verdict and user-facing summary go FIRST, archival LAST. If archival fails (permissions, collision unresolvable), it never affects the user-delivery message — they already got the verdict. Director retries archival or surfaces the failure as an internal note.

For aborted / rejected engagements (rare — only when user closes engagement without ACCEPT), use `--force`. Default refuses to archive without an ACCEPT verdict to prevent accidental loss of in-progress state.

### Order of operations on ACCEPT (and what to do if archival fails)

Strictly: **verdict → user-facing summary → archival**. Archival is the LAST action, not the first. If archival fails, the user has already received the verdict and the deliverables — no rollback needed; the engagement is logically closed.

1. Director writes ACCEPT verdict in `engagement/acceptance-log.md` with criteria trace.
2. Director runs `python ~/.claude/scripts/handoff-paths-check.py engagement/acceptance-log.md` — if any phantom path in evidence column, fix verdict before proceeding.
3. Director composes user-facing summary (Russian, lists deliverables verbatim).
4. Director runs `python ~/.claude/scripts/engagement-archive.py`.
5. If archival succeeds: print archived-to path, done.
6. If archival fails (collision unresolvable, permissions, disk full):
   - **Do NOT undo the verdict.** The user has the deliverables; engagement is closed.
   - Log the archival error in a new section `## Archival pending — {timestamp}` at the end of `acceptance-log.md`.
   - User is notified once: "Engagement принят, артефакты доставлены. Архивация не сработала: {error}. Повторю после {fix}."
   - Director (or any later session) retries `engagement-archive.py` once the blocker is cleared. Script is idempotent — re-running on already-archived state is a no-op.

### Archival never blocks user delivery

The user-facing message goes BEFORE archival. If you swap the order, an archival hiccup becomes a perceived failure of the engagement itself. The agency-model contract is: deliverables + verdict are user-visible artefacts; archive is internal bookkeeping.

Archive directory:
- `engagement-archived/` lives in the project working directory.
- One subdirectory per completed engagement: `{YYYY-MM-DD}-{engagement-name}/`.
- Contents preserved as-is (criteria.md, handoff.md, acceptance-log.md, executor-reports/, validation-log.md, screens/, traces/) — kept for audit-trail and future cross-reference.
- Never delete archives without user instruction.

If a new engagement starts in the same project: `agency-intake` creates a fresh `engagement/` directory; archival happened on the previous one already, so there are no collisions.

Pre-archival sanity check (director runs before mv):
- `criteria.md` exists (we're archiving a real engagement, not random folder).
- `acceptance-log.md` exists with at least one ACCEPT verdict (don't archive engagements that never reached accept).
- No new engagement is mid-flight in this directory (no `engagement/` already exists at target archive path).

If the user asks to re-open / reference an archived engagement: read `engagement-archived/{date}-{name}/` directly. Do not unarchive (move back) — that would make the new engagement collide with the old.

## Task-tool prompts: minimum viable content

Long Task-tool prompts (200+ lines) duplicate what the dispatched agent already knows from its skills frontmatter (`engagement-protocol`, domain methodology, Engagement-mode contract). Verbose prompts:
- inflate token usage with no signal gain
- delay agent's first action (more tokens to parse before responding)
- look like "stalled" to operators waiting for first heartbeat

### Minimum-viable Task prompt (canonical)

```
Engagement: {engagement-name}
Iteration: {N}
Criteria: {absolute path to engagement/criteria.md}
Review mode: {lean | full | solo}

Read criteria.md first. Engagement context, source paths, and constraints
are inside it — do not re-paste here.

Begin {phase-name}. Heartbeat per phase per protocol.

Return summary on completion (or escalation).
```

**That's it.** 5–8 lines.

### What NOT to include in Task prompts

- Full criteria text — agent reads `criteria.md` itself.
- Source artefact paths — listed in `criteria.md`.
- Protocol reminders ("don't create files outside whitelist", "use heartbeat", "11-section handoff") — those are in `engagement-protocol` skill which the agent loads via frontmatter. Re-pasting them in prompts is noise.
- Anti-pattern lists — already in agent's `## Anti-patterns` section.
- Backstory about why the engagement exists — irrelevant to execution.

### When MORE prompt content IS justified

- Specific phase-only constraint (e.g. "this iteration is rework — focus only on F-02 token mismatch from previous reject").
- One-off override that contradicts protocol default (e.g. "skip docs-pipeline for this engagement — out of scope per scope-sync").
- New convention the agent's skills don't yet teach (very rare; if it recurs, document in skill instead).

Even in these cases, additional content stays under 30 lines.

### Why this matters

In the actual `landing-hybrid-header-hero` test we ran: a 200-line dispatch prompt to design-lead caused the agent to spend its first 20-40 seconds parsing instead of acting. Operator interpreted this as a stall and interrupted. Friction was self-inflicted by verbose prompt, not by agent behaviour.

## Resume policy (interrupted iterations)

If a lead's previous session was interrupted mid-iteration (Task tool cancelled, context compaction, manual user stop) and a new dispatch resumes the same iteration N:

1. **Inspect existing engagement state first.** Files from prior session may include partial executor-reports, traces, screens, partial hybrid build, etc. They are not invalid by virtue of being from an interrupted session — but they are not automatically valid either.

2. **Decide per-artefact: reuse or regenerate.** Heartbeat at the resume point MUST list which artefacts you reuse vs regenerate:

   ```markdown
   ## Heartbeat — Phase Resume — completed {ts}
   - iteration: {N} (resumed)
   - artefacts reused-from-prior-session: traces/iter-N/nav-anchor-click.json (validated still represents current code), executor-reports/X.md (Phase 3 work intact)
   - artefacts regenerated: screens/iter-N/light/*.png (re-captured against latest code), validation-outputs/* (validators re-run on resume)
   - artefacts deleted: scope-sync.md (legacy from prior intent that was superseded)
   - next phase: {name}
   ```

3. **Reuse criteria:**
   - Trace JSON is reusable IFF its `captured_at` is newer than the most recent code change AND the artefact under test is unchanged.
   - Screens are reusable IFF the surface they capture is unchanged since they were taken.
   - Validator outputs are reusable IFF the artefact they validated is unchanged.
   - Executor reports are reusable IFF the specialist's work isn't being redone.

4. **Stale = regenerate, no shortcut.** Reusing a trace or screen that doesn't represent current state is the silent-failure mode this rule prevents. If unsure → regenerate.

5. **Trace `captured_at`:** validators (ux-review) and `trace-schema-check.py` warn if `captured_at` is older than the file mtime of any artefact the trace references. Lead must regenerate or update the trace.

## Lead heartbeat (mandatory — every lead, every phase)

Long-running orchestrator-agents (top-leads, mid-leads) dispatched via Task tool can stop streaming tokens for many minutes while doing internal planning, file edits, or sub-dispatches. The user/operator has no signal whether the agent is alive or stuck.

To prevent silent stalls: **every lead appends one heartbeat line to `engagement/validation-log.md` after each completed phase**. The presence + recency of these lines is the heartbeat.

### Heartbeat entry format

```markdown
## Heartbeat — Phase {phase-name} — completed {YYYY-MM-DD HH:MM:SS}
- iteration: {N}
- artefacts updated: {list of paths touched in this phase}
- next phase: {name | "handoff" | "done"}
```

Inserted at the top of validation-log.md (newest first), so observers `tail validation-log.md` get latest signal.

### Cadence

- **Top-lead phases:** intake-understanding, plan, dispatch (per mid-lead), cross-cutting validation, docs-pipeline, self-acceptance, handoff. → 6-8 heartbeats per iteration.
- **Mid-lead phases:** intake-from-top, plan, dispatch (per specialist), aggregate, return-up. → 4-5 heartbeats per dispatch.
- **Specialists** do NOT heartbeat — they are short-lived and return synchronously to mid-lead.

Heartbeats are append-only, never modified. They are NOT validation findings — `validation-log.md` retains its existing role as validator output index, the Heartbeat sections live alongside the per-validator sections.

### When user / operator can confirm "stuck"

If `validation-log.md`'s most recent Heartbeat is older than:
- 5 minutes for an `S` engagement
- 15 minutes for `M`
- 30 minutes for `L` (mid-lead may legitimately spend ≥30min in a single complex phase)

→ engagement is presumed stalled. Recommended action: open the engagement directory, check git status / file mtimes for any artefact created since last heartbeat. If lead made progress without writing heartbeat — that's a protocol violation (REJECT trigger). If no progress visible — abort and re-dispatch from last completed phase.

`engagement-doctor.py --check stalled` automates this signal (Tier 11.6).

### Anti-patterns

- Don't backfill heartbeats — write them when the phase actually completes, not in handoff.md after the fact.
- Don't heartbeat mid-phase — only at phase boundaries.
- Don't heartbeat from specialists — they have no phases.

### Token budget guard (Tier 14)

Each tier has an empirical token envelope (lead + director combined per iteration):

| size | budget per iter | trigger |
|---|---|---|
| S | 100,000 | one-shot, abnormal if more than 1 iter |
| M | 500,000 | M baseline; 500k×2 = 1M is the realistic cap before user-touch |
| L | 1,500,000 | L baseline; rare engagement should exceed cumulative 4-5M before scope-sync escalation |

Lead runs after every heartbeat from Phase 4 onward (when costly subagent waves happen):

```bash
python ~/.claude/scripts/token-budget.py engagement/ --json
```

Exit codes:
- `0`: usage ≤ 80% × cumulative_budget — on track.
- `0` with `status: warn`: 80% < usage ≤ 100% — finishing strategy, no panic.
- `1` with `status: fail`: usage > 100% — over budget. Lead chooses one:
  1. **Auto-promote tier** (S→M, M→L) via `size-detect.py --auto-promote` if scope grew naturally.
  2. **Scope-sync escalation** with the user via director (per criteria.md mutability rules) — propose deferrals.
  3. **Accept partial**: ship completed atoms, list rest in handoff §11 Known deferrals.

The guard reads `~/.claude/projects/{project}/metrics.jsonl` for `tokens_used` per iteration. When that field is absent (older director runs), the script falls back to `duration_s × 800` token proxy. Director must start including `tokens_used` field per the per-iter metric line.

The guard is a budget signal, not a hard kill switch. Lead can override one iteration with explicit user-touch authorisation in `scope-sync.md`. Demoting tier or under-reporting tokens to dodge the guard = protocol violation; director rejects on next sweep.

### Size auto-promote check (every heartbeat after Phase 2)

After Phase 2 (`plan.md` frozen) and on every heartbeat thereafter, lead runs:

```bash
python ~/.claude/scripts/size-detect.py engagement/ --mode runtime --auto-promote --json
```

The script measures the engagement against tier thresholds (executor-reports count, tasks count, ui_surfaces, diff files/LOC, deploy crossed) and:

- **Exit 0:** observed ≤ current tier. No action — heartbeat as usual.
- **Exit 1:** observed > current tier. With `--auto-promote`, the script:
  1. Rewrites `criteria.md` frontmatter `size:` to the new tier (S→M, M→L; demotion forbidden).
  2. Appends a `## Auto-promote — size X → Y — {ts}` block to `scope-sync.md` documenting the trigger measurements.

Lead acks the promote in the same heartbeat: `next phase: {name}; auto-promoted: {old}→{new}` and adjusts subsequent rigour to match the new tier (e.g. tasks/ now mandatory if promoted to L, §11 now required if promoted to M).

The check is cheap (~50ms, no subprocess fork-out beyond directory scan + handoff.md grep). Skipping it once is fine; never running it across an engagement is the failure mode where lead silently treats an L engagement as M and gets REJECTed by director's `handoff-precheck.py --mode ready`.

`director-sweep.py` re-runs the same check during acceptance — if it sees current still S/M but observations are clearly L, that's a REJECT with reason "size drift, lead failed to promote".

#### Retroactive consequences of auto-promote

Promotion shifts the **acceptance protocol** to the new tier retroactively. Specifically:

1. **Iteration budget bumps +1** on the promoted engagement (S→M makes M-budget 3 instead of 2; M→L makes L-budget 4 instead of 3). One extra round absorbs the adversary findings on work that was produced under lighter-tier mode.
2. **Adversary pass is required** at handoff regardless of when promotion occurred. S engagements that auto-promote to M will receive Opus adversary review; M engagements promoted to L will receive full consilium. Producer cannot skip adversary by claiming "the work was done under S/M mode".
3. **`scope-sync.md` documents the budget revision**:
   ```markdown
   ## Auto-promote S → M @ 2026-05-09T14:23:00Z
   Trigger metrics: 3 specialists invoked, 8 diff_files, ux_surfaces=2
   New acceptance protocol: M-tier (Opus adversary required at handoff)
   Iteration budget revised: 2 → 3 (one extra round budgeted for adversary findings)
   ```
4. **Validators applicable to the new tier must run** before handoff — if M-tier requires accessibility-validator on ux_heavy and the engagement was produced as S without it, lead must run validator now and fix gaps before submitting.

Lead must address gaps that the new tier requires **before handoff**, not at acceptance. Auto-promote should not surprise the director at acceptance time — heartbeat-driven detection means there is always at least one phase between promote and handoff for the lead to catch up. If lead submits an auto-promoted engagement without addressing the new tier's requirements, `handoff-precheck.py` for the new tier will FAIL on missing artefacts (e.g. `tasks-decomposition` check fails on M→L without `tasks/INDEX.md`).

## Machine-checked gates

Two distinct points where machine checks fire. Both use scripts at `~/.claude/scripts/`:

### At intake (secretary, BEFORE handoff to lead)

```bash
python ~/.claude/scripts/preflight.py --criteria engagement/criteria.md --json --auto-fix
```

Purpose: ensure validation environment is ready *before* the engagement starts. Wave-2-style "Docker missing for the whole engagement, then user is asked to validate by hand" is what this prevents.

### Before handoff (lead, then director on receipt)

```bash
python ~/.claude/scripts/handoff-precheck.py engagement/ --json
```

Purpose: ensure the *engagement state itself* is ready for review — whitelist clean, paths exist, validators ran, no banned language, no undisclosed dangerous ops, etc. Includes a re-run of preflight (environment may have degraded mid-engagement: Docker crashed, secret rotated, package broke).

**Hard-gate tier dispatch (tiered acceptance refactor):**
- S-tier: 6 critical checks (criteria-frontmatter, whitelist, preflight, handoff-paths, danger-scan, verdict-canonical)
- M-tier: 11 checks (S + handoff-sections, self-acceptance-thinness, iteration-counter, validator-outputs, size-drift)
- L-tier: 19 checks (all)

The script reads `size:` from `criteria.md` frontmatter and runs only the tier-applicable checks (no soft-skip overhead). Use `--all-checks` for debug or `--override-checks NAME1,NAME2` for ad-hoc additional checks.

### At acceptance (M/L tiers only — S has no director phase)

```bash
# M-tier: single Opus adversary
python ~/.claude/scripts/adversary.py engagement/ --consilium M

# L-tier: 5-reviewer consilium (peer-Opus + 2× Codex + Sonnet + Haiku)
python ~/.claude/scripts/adversary.py engagement/ --consilium L

# Aggregate:
python ~/.claude/scripts/consilium-synth.py engagement/

# Human reads consilium-summary.md as supreme judge, writes human-directive.md.
# THEN director acts per directive (PROCEED_TO_VERDICT | REJECT_NOW | DIRECTED_VERDICT).

# After director writes verdict, mechanical post-check:
python ~/.claude/scripts/handoff-precheck.py engagement/
# (M/L tier sets include `human-directive` and `director-verdict` checks.)
```

**Two-pass adversary protocol (filesystem-isolated):** each role runs Pass 1 in subprocess on a CURATED copy of engagement/ that physically excludes handoff.md, acceptance-log.md, consilium-summary.md, human-directive.md, and validation-outputs/. Pass 1 writes preliminary findings. Pass 2 runs on full engagement with preliminary findings injected; reads handoff and finalizes verdict. Outputs:
- `engagement/validation-outputs/{role}-iter-N-preliminary-{ts}.json` (Pass 1)
- `engagement/validation-outputs/{role}-iter-N-{ts}.json` (Pass 2 final)

`consilium-synth.py` aggregates Pass 2 outputs into `engagement/consilium-summary.md` with two-stage dedup (cluster by evidence_path first, then issue similarity), convergent findings, cross-family disagreements, naive-layer catches, similarity matrix.

**Human-as-supreme-judge step:** human reads consilium-summary.md, writes `engagement/human-directive.md` with `Decision: PROCEED_TO_VERDICT | REJECT_NOW | DIRECTED_VERDICT`. This filters adversary noise (false positives, filler) before director processes. Director then acts per directive — adjudicating remaining signals with explicit markers (SUSTAINED/OVERRULED, SIDED WITH X, REAL/FALSE_POSITIVE, ACKNOWLEDGED). `director-verdict-check.py` mechanically verifies markers exist for every consilium signal.

See `director-acceptance-protocol` skill for full per-tier procedure.

### One canonical entry point bundles every handoff machine check

```bash
python ~/.claude/scripts/handoff-precheck.py engagement/ --json
```

The wrapper runs:
1. **whitelist** — `ls engagement/` against the closed allowed list (no rogue `preview.md` / `compliance.md`).
2. **criteria-frontmatter** — `criteria.md` has `engagement`, `domain`, `ux_heavy`, `tools_required`.
3. **preflight** — every tool in `tools_required` is reachable (delegates to `preflight.py`).
4. **handoff-paths** — every path cited in `handoff.md` exists on disk (delegates to `handoff-paths-check.py`).
5. **handoff-sections** — required sections §1–§5, §7, §8, §11 present; §6 present if `ux_heavy: true`.
6. **self-acceptance-thinness** — §7 concerns sublist has ≥2 real items (catches LLM "no concerns" bad-faith).

Non-zero exit = INCOMPLETE handoff. Never submit / accept with red exit codes — these checks exist precisely because the underlying bug classes (broken tooling, phantom paths, rogue files, thin rehearsal) cannot be reliably caught by reading prose.

The individual sub-scripts (`preflight.py`, `handoff-paths-check.py`) can also be run directly if you want a focused check (e.g. mid-engagement Docker sanity), but at gate time always go through the wrapper.

## Escalation template (director → user, Russian)

```
После двух кругов правок директор не принимает работу. Текущая блокировка:
- {blocking item 1}
- {blocking item 2}

Продолжать ещё круг или пересмотреть scope / criteria?
```

## Cross-domain handoff

Two-domain engagements only. Secretary classifies the primary domain and declares secondary as a downstream dependency in `criteria.md`. Three-domain engagements are rejected at intake — user must split them.

### State location (collision-free)

Primary engagement uses `engagement/` (the canonical path). Secondary engagement uses `engagement-secondary/{domain}/` to avoid overwriting primary state.

```
project-root/
├── engagement/                      # primary (e.g. design)
│   ├── criteria.md
│   ├── handoff.md
│   └── ...
└── engagement-secondary/
    └── marketing/                   # secondary (e.g. marketing)
        ├── criteria.md              # derived from primary's accepted output
        ├── handoff.md
        └── ...
```

`handoff-precheck.py engagement-secondary/{domain}` works on the secondary the same way as on primary. All scripts (preflight, paths-check, danger-scan, archive) accept any directory — they don't hardcode `engagement/`.

### Workflow

1. Secretary writes `engagement/criteria.md` with primary domain + `cross-domain.secondary: <domain>` note.
2. Primary lead completes phases.
3. Primary director ACCEPTs.
4. Primary lead (NOT director) initiates secondary using **`secondary-init.py`** (deterministic, scripted):

   ```bash
   python ~/.claude/scripts/secondary-init.py \
     --domain marketing \
     --primary engagement/ \
     --scope "Запустить кампанию на лендинге, созданном в primary engagement"
   ```

   The script:
   - Creates `engagement-secondary/marketing/`
   - Reads primary's `criteria.md` to inherit `engagement` name, project metadata
   - Writes a fresh `criteria.md` with frontmatter (`domain: marketing`, `parent_engagement: ../engagement/`, fresh `size`, `ux_heavy`, `tools_required` defaults)
   - Initialises `iteration` file at 1
   - Creates `executor-reports/`, `validation-outputs/` directories
   - Writes a `<!-- inherits from {primary path} -->` audit comment

5. Primary lead dispatches secondary lead via Task tool with the new `engagement-secondary/{domain}/criteria.md` path.
6. Secondary lead runs its own full engagement cycle in `engagement-secondary/{domain}/`.
7. Secondary director ACCEPTs (or REJECTs into secondary's own iter loop — primary is already done).
8. After secondary ACCEPT, primary director composes unified user message referencing both engagements.

### Archival

On final user delivery (after both ACCEPT'ed):

```bash
python ~/.claude/scripts/engagement-archive.py             # archives primary
python ~/.claude/scripts/engagement-archive.py --root . --secondary marketing   # archives secondary
```

Both end up under one dated archive folder:

```
engagement-archived/
└── 2026-05-06-acme-launch/
    ├── primary/             # mv of engagement/
    └── secondary-marketing/ # mv of engagement-secondary/marketing/
```

### Anti-patterns

- Don't write secondary state into `engagement/` — collision with primary, irrecoverable.
- Don't dispatch secondary lead before primary director ACCEPTs — the secondary depends on primary's accepted output, not work-in-progress.
- Don't archive primary while secondary is still iterating — final unified message comes after both ACCEPT.

## Observability

Every director iteration appends one JSON line to `~/.claude/projects/{project}/metrics.jsonl`:

```json
{
  "ts": "2026-04-20T15:30:00Z",
  "engagement": "{engagement-name}",
  "domain": "marketing|dev|design",
  "director": "{director-agent}",
  "iter": 2,
  "verdict": "accept|reject",
  "blocking_count": 3,
  "reran_validators": ["reality-checker", "skeptic"],
  "duration_s": 180
}
```

No live dashboard. Raw JSONL for retrospective analysis.

## Role boundaries (authoritative)

| Stage | Secretary | Lead | Director | User |
|---|---|---|---|---|
| Intake capture | ✓ | — | — | source |
| criteria.md draft | ✓ | — | reads | approves |
| Scope sync (optional) | — | raises Q | writes Q/A + freeze | — |
| Planning | — | ✓ | — | — |
| Specialist dispatch | — | ✓ | — | — |
| Cross-cutting pipelines | — | ✓ | re-runs on sweep | — |
| Handoff package | — | ✓ | — | — |
| Accept/reject verdict | — | — | ✓ | — |
| User-facing delivery | — | — | ✓ (on accept) | receives |
| Escalation | — | — | ✓ (at round 3) | decides |

## Anti-patterns

- **Don't invent new artefact sections.** Schemas are fixed. Add fields via this skill's next revision, not per-engagement.
- **Don't create files outside the whitelist.** Forbidden list (preview/compliance/review-log/etc.) is closed. Wanting one of those = signal that content belongs in an existing whitelist file. Director treats out-of-whitelist files as REJECT.
- **Don't skip `criteria.md`.** Directors cannot accept against fuzzy intent.
- **Don't let the lead write `acceptance-log.md`.** That's self-acceptance.
- **Don't let the director write `handoff.md`.** That's author-as-reviewer.
- **Don't talk to the user mid-engagement** (lead / director) — lead surfaces via director; director surfaces only on accept or escalation. Pre-flight tooling unavailability is the only mid-engagement user touch allowed (and it's blocker disclosure, not work consultation).
- **Don't issue `ACCEPT CONDITIONAL`.** Verdict is binary. Tool unavailability = REJECT with reason `validation incomplete`, not a deferral to user.
- **Don't defer Playwright capture / exercised flow logs to the user.** UX validation artefacts are part of hand-off. If they're missing, hand-off is incomplete.
- **Don't silently exceed iteration budget.** Escalate before round 3, OR earlier on repeating critique.
- **Don't use slot language.** "Slot 1/2 used", "last attempt", "final round" — banned. Escalation is root-cause based.
