---
name: crucible-meta-governance
description: Use when deciding whether to pivot or persist, whether to ship or hold, whether to scope-narrow a partial pass, how to manage context-window budget, or when to record a supersedes link. Also for sensing "the investigation is confused" vs "on track". Covers the 6 decision-layer patterns that kept the session on a productive arc — physical-constraint first pivot, incremental artifact promotion, gate-failure scopes not kills, agent disagreement as signal, context-budget discipline, and supersede-not-rewrite. TRIGGERS - should we pivot, ship this, kill or refine, scope narrow, partial pass, supersede finding, superseded by, context budget, conversation too long, agents disagree, promote artifact, physical constraint, stop or continue.
allowed-tools: Read, Grep, Glob
---

# Meta-governance — 6 decision-layer patterns

> **Self-Evolving Skill**: If any pattern here misled decisions, update the section AND append to `references/evolution-log.md`. Don't defer.

These patterns are meta-level — they're about the investigation itself, not its content. Invoke when a decision must be made: pivot vs persist, kill vs narrow, ship vs hold.

---

## 1. Physical-constraint-first pivot

When brute force yields null, extract the **execution constraint** and redesign the hypothesis class to fit it. Don't iterate on a hypothesis that ignores reality.

Session example: 17 directional-signal null campaigns → user pivoted:

> "What's the best strategy for a highly random walk market?"
> "I can only trade on a traditional MT5 broker that allows hedging positions."

From this came the synthetic straddle (BUY_STOP + SELL_STOP pending orders, OCO). Constraint-driven design unlocked the strategy class. The math (diffusive displacement in random walks: `E[|ΔS|] > 0`) was always available; what was missing was honoring the execution venue.

**Ask yourself**:

- What execution venue is the user actually on?
- What types of orders are possible?
- What's the realistic slippage / commission / spread?
- What position-sizing constraints apply?

If the hypothesis doesn't survive these questions, pivot the hypothesis, not the statistics.

---

## 2. Incremental artifact promotion (/tmp → repo early)

Move findings from `/tmp/` to the persistent repo (`audits/YYYY-MM-DD-slug/`) **as soon as a result survives two independent tests**, not "when done".

Session anti-pattern: reproducers written in `/tmp/` during exploration, causing reproducibility loss on reboot. The moment a result passed Gate C (OOS) it should have been promoted — not after the 4-gate suite completed.

**Promotion triggers** (at least one required):

- Result passed shuffled-null z > 3 AND hasn't been contradicted
- An agent synthesized a verdict that supersedes an earlier one
- A reproducer script ran successfully twice

**Mechanics**:

```bash
mkdir -p findings/evolution/audits/$(date +%Y-%m-%d)-slug
cp /tmp/reproducer.py /tmp/artifact.json findings/evolution/audits/.../
# Write CLAUDE.md navigator + verdict.md
# Append to evolution.jsonl
```

What's impermanent gets lost.

---

## 3. Gate-failure scopes not kills

When a signal fails one of the serial gates (see Skill B §2), downgrade its **scope**, don't kill it outright.

| Failed gate                    | Action                                                         |
| ------------------------------ | -------------------------------------------------------------- |
| Gate A (directional breakdown) | Learn which side — often simplify to one-side                  |
| Gate B (mirror symmetry)       | Note asymmetry; record as "direction-biased" feature           |
| **Gate C (OOS time-split)**    | **Kill.** No scope-narrowing rescues in-sample overfit.        |
| Gate D (cross-asset)           | Downgrade to `<asset>-specific`; keep                          |
| Gate E (per-year)              | Flag bad years as "regime-unfavorable"; explore regime filters |

NGRAM3FU-STRADDLE-001 failed Gate D (XAUUSD, GBPUSD) but passed A/B/C/E. Status downgraded to `eur-only`, NOT killed. A year later, if XAUUSD develops different microstructure, it could be retested — this is the `resurrect_if:` trigger (see Skill D).

**Principle**: scope-narrowing preserves optionality. Hard kills lose negative knowledge.

---

## 4. Agent-lens disagreement as signal

When parallel agents DISAGREE, the disagreement itself is diagnostic.

Session example: 4 agents reported "lower rejection at bottom → 67.8% UP" as a signal. Agent 5 (hidden-signal hunter, critic) flagged it as label leakage. The disagreement pointed precisely at the bug.

**When agents disagree**:

1. Don't average or vote — map WHAT they disagree about
2. Check: does one agent's evidence involve an implicit assumption the other rejects?
3. Disagreement about mechanism → investigate mechanism (may be label leakage, confound, or real but lens-bound effect)
4. Disagreement about significance → check each agent's multiple-testing burden

**Anti-pattern**: picking the agent that gives the answer you want. If the critic-agent disagrees with the proposer-agents, the critic is usually right.

---

## 5. Context-budget discipline

Conversation and data context are scarce. Reserve them for the most ambiguous questions; compress known-good findings ruthlessly.

**Hierarchy of compression**:

- Raw bars (not for agents; 67 MB)
- Token-rendered bar sequences (60 KB; good for one agent)
- Stats tables (60 KB; consumable by 5 parallel agents) — PREFERRED
- Ledger entries (1 KB; tracks findings)

**When context feels tight**:

1. Emit a fresh audit folder with artifacts; future sessions load that, not the transcript
2. Drop detailed raw data from agent prompts; use markdown summaries
3. If you must hand off mid-session, write a handoff file in `.planning/` (not plugin scope; see project root)

**Signal: context is BLOCKED when**: you find yourself re-reading the same file twice in one session; or agents ask for re-briefings; or you can't remember what was decided 10 turns ago. Compress to an audit folder.

---

## 6. Supersede-not-rewrite

When a later finding replaces an earlier one, **add** a new ledger entry with `supersedes: "OLD-ID"`; **update** the old entry with `superseded_by: "NEW-ID"`. Never rewrite or delete.

**Why**:

- Future auditors need the trail, not the final answer
- A superseded finding may contain negative knowledge (why it failed) that informs future work
- Deletions create "mysterious silences" that agents can't interpret

**Canonical chain** from session:

```
NGRAM3FU-STRADDLE-001                     preliminary-positive
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-GATES               gates-validated (Gate D failed → eur-only)
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-FULL-HISTORY        confirmed at 7.18M bars
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-FILTERED            Phase-L filter validated
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-FULL-STACK          Phase-L + Phase-M final
```

Note `supplements` vs `supersedes`: supplement EXTENDS; supersede REPLACES. Pick the right relationship.

**Anti-pattern**: editing an old ledger entry because the finding "got better". That's rewriting history. Add a new entry.

---

## Confirmation counts

| Pattern                      | Confirmed | Notes                                               |
| ---------------------------- | --------- | --------------------------------------------------- |
| 1. physical-constraint pivot | 1         | The session-defining pivot (directional → straddle) |
| 2. artifact promotion        | Multiple  | Every /tmp → audit folder move                      |
| 3. gate-failure scopes       | 1         | NGRAM3FU-STRADDLE Gate D → eur-only                 |
| 4. disagreement as signal    | 2         | Act-2 label leakage catch; Phase L agent variance   |
| 5. context-budget            | Implicit  | Used every time we preferred stats tables over raw  |
| 6. supersede-not-rewrite     | 5         | NGRAM3FU-STRADDLE chain, 5 entries                  |

---

## Post-Execution Reflection

After invoking this skill:

1. Did a pattern save you from a bad decision? Increment `confirmed` count; note in `references/evolution-log.md`.
2. Did a pattern produce the wrong call? Demote it; record context + link to where it misled.
3. A new decision pattern emerged that isn't here? Draft a section.
4. A pattern could be better-phrased for future agents? Edit the text directly; log why.
