---
name: rules-from-coding-agent-failures
description: Use when you operate coding-agent infrastructure with a feedback signal — eval suites, run-level telemetry, A/B baselines, or a skill catalog under test — and want to capture observed agent failures in a per-file reflection log, promote recurring patterns into AGENTS.md rules, hooks, or CI gates via the three-entry floor, and score Level 4-5 (Specification Architecture, Sovereign Engineering) maturity. Triggers on 'reflection log', 'agent failure log', 'promote this pattern into a rule', 'post-incident eval case', 'harness maturity Level 4 or 5', 'set up a feedback loop for agent failures'. Do NOT use for first-pass agent-readiness scaffolding when no feedback signal exists yet (use harden-repo-for-coding-agents), or to instrument an AI product's eval and optimization loops (use agent-evals).
license: MIT
---

# Evidence-Driven Agent Rules

Capture observed agent failures, promote the recurring ones into rules, and score
advanced (Level 4–5) agent-readiness. For repos with a *feedback signal*: eval suites,
run-level telemetry, A/B baselines, or a skill catalog under test. No signal yet — no
benchmark, no telemetry baseline, no "we watched the agent do X" stream? Use
`harden-repo-for-coding-agents` alone; promotion here needs evidence to drive it.

**Produces:** `capture` writes the reflection-log scaffold (`docs/reflection-log/README.md`,
`_template.md`, and a repo-root `README.md §Agents` pointer); `promote` proposes the
smallest rule / hook / CI-gate diff plus the entries it cites; `assess-l4l5` scores
Level 4–5 maturity with a ceiling and prioritized gaps. Tracked `assess-l4l5` runs also
emit `rules-from-coding-agent-failures-findings-ledger-<date>-<slug>.md` and `rules-from-coding-agent-failures-workflow-state-<date>-<slug>.json`.

## Core principle

**Single observations are cheap to record; rules cost trust to ship.** Recording is
low-friction and always worth it once you can write a non-trivial *What to do
differently* line. Promotion — turning logged entries into an `AGENTS.md` rule, hook, or
CI gate — is gated on **three or more entries describing the same gap**, because
scaffolding from one observation overfits to plausible boilerplate (W1, Mündler et al.,
arXiv:2602.11988, Feb 2026: autogenerated context files drop task success ~3% and inflate
cost >20%). The recording bar and the promotion bar are deliberately different; the
earlier single-file log conflated them and agents self-filtered entries worth keeping.
When in doubt, record. `promote` searches later.

## Quickstart

Begin with `capture` to scaffold the log:

> `Run capture: scaffold the reflection log for this repo.`

It loads `references/playbooks/reflection-log.md`, writes `docs/reflection-log/README.md`
and `_template.md`, and adds a repo-root `README.md §Agents` pointer so the log is
discoverable from an always-loaded surface. **Precondition:** if `AGENTS.md` exists but
does not link the log, `capture` refuses. Add one line under its references section and
re-run:

```markdown
- [docs/reflection-log/](./docs/reflection-log/) — per-failure entries; rules in this file may cite them.
```

`promote` and `assess-l4l5` come later: they assume Stage 0 (this log) and Stage 1
(AGENTS.md from `harden-repo-for-coding-agents`) are already in place. Full staging in
[`references/bootstrap-order.md`](./references/bootstrap-order.md).

## Activation

- **Bare invocation** (`"set up reflection log"`, `"use rules-from-coding-agent-failures"`): show the intent
  menu (`capture` / `promote` / `assess-l4l5`) inline and wait. No file inspection, network
  calls, or writes. This skill's scope is narrow enough that the SKILL.md body is the
  router — no separate CSV.
- **Concrete invocation** with intent inferable: skip to Workflow step 2.
- **Concrete invocation with ambiguous scope**: ask one blocker question to fix the intent
  first; do not inspect private systems before then.

## Workflow

1. **Pick intent.** `capture` (scaffold the log + README pointer), `promote` (find three
   same-gap entries and propose the rule/hook/gate that closes them), or `assess-l4l5`
   (score Level 4–5 maturity). Ambiguous → ask once.
2. **Load context.** Always: `references/empirical-warnings-w1.md` and
   `references/playbooks/reflection-log.md`. For `promote`, read the whole log
   (`docs/reflection-log/[0-9]*.md`); if the closing change is a hook, CI check, static
   validator, or branch-protection rule, also load `references/playbooks/gate-hardening.md`.
   For `assess-l4l5`, also load `references/core/maturity-rubric.md`.
3. **capture.** Scaffold `docs/reflection-log/README.md` and `_template.md` from
   `templates/artifacts/reflection-log/`; add the `README.md §Agents` pointer if absent;
   refuse if `AGENTS.md` exists without a link to the log.
4. **promote.** Group entries by `sub-surface:` (`grep -l 'sub-surface: <name>'
   docs/reflection-log/[0-9]*.md`). For any group of three or more, present the entries,
   propose the smallest closing change, and confirm before writing. Refuse below the
   floor. For a gate, require the **gate-hardening variant matrix** and a regression
   fixture before calling it done.
5. **assess-l4l5.** Assume Levels 1–3 are already scored by `harden-repo-for-coding-agents` (confirm
   with the user); score Levels 4–5 against the rubric; report ceiling and gaps with
   stable IDs (`ED-L4L5-NNN`) per `references/trackable-findings.md`.
6. **Emit, citing the entries you drew on** (filenames plus their *What to do
   differently* lines):
   - `capture` → files written, the §Agents pointer diff, a validation checklist.
   - `promote` → the proposed rule/hook/gate, the three-plus entries that justify it, and
     the AGENTS.md/hook diff for confirmation.
   - `assess-l4l5` → maturity score per layer, ceiling, prioritized gaps.
7. **Create tracking state.** For `assess-l4l5` with 7+ gaps, a level-ceiling blocker, or
   a save/track request, write both: a ledger at
   `docs/audits/rules-from-coding-agent-failures-findings-ledger-<YYYY-MM-DD>-<scope-slug>.md` and workflow
   state at `docs/audits/rules-from-coding-agent-failures-workflow-state-<YYYY-MM-DD>-<scope-slug>.json` (fall
   back to `audit-artifacts/rules-from-coding-agent-failures-{findings-ledger|workflow-state}-<YYYY-MM-DD>-<scope-slug>.{md|json}`
   if `docs/audits/` is unwritable). Report both paths. Roadmaps, issues, and promotion
   changes still need confirmation.

## Modes

Guided Draft (default), Autopilot, Grill Me — contract in
[`references/modes.md`](./references/modes.md). For `promote`, default to **Grill Me**:
shipping a rule is expensive trust-wise, so open the trade-offs rather than infer them.

## Output requirements

Every output cites the reflection-log entries it draws on — filenames plus their
*What to do differently* lines — names the intent, and carries that intent's load-bearing
section (Workflow step 6). For `promote`, that includes the three-entry floor and, for a
gate, the gate-hardening variant matrix and a regression fixture; for `assess-l4l5`,
stable `ED-L4L5-NNN` IDs.

## Reference map

- `references/empirical-warnings-w1.md` — the W1 three-entry floor and Mündler citation (owned here).
- `references/empirical-warnings.md`, `references/lenses.md`, `references/modes.md`,
  `references/trackable-findings.md` — symlinks into `skills/_shared/`.
- `references/core/maturity-rubric.md` — Levels 4–5, extending `harden-repo-for-coding-agents`'s 1–3.
- `references/playbooks/reflection-log.md` — the one sub-surface this skill owns.
- `references/playbooks/gate-hardening.md` — adversarial variant matrix and fixture
  requirements for promoted hooks, checks, and CI gates.
- `references/bootstrap-order.md` — the Stage 0/1 staging dependency.
- `templates/artifacts/reflection-log/` — README (with the recording-vs-promotion callout)
  and the per-entry `_template.md`.
- `templates/findings-ledger.md`, `templates/workflow-state.json` — saved tracking for `assess-l4l5`.
- `evals/` — static checks, trigger evals, activation cases.

## See also

- `harden-repo-for-coding-agents` — scaffolds the project-context AGENTS.md that this skill's `promote`
  adds rules to. Pair them.
- `design-for-agent-users` — the umbrella AX discipline; this is its evidence-and-feedback-loop arm.
