---
name: nocap-systematic-analysis
description: Defines a four-phase investigation protocol for any problem, claim, error, anomaly, or system failure the user presents. Prevents premature conclusion-jumping and guess-and-check thrashing. Use when the user presents a problem to diagnose, an anomaly to explain, a claim to evaluate, or a situation where the cause is unclear. Integrates with nocap.
---

Author: HyperWorX (https://github.com/HyperWorX)
License: MIT

# Systematic Analysis Protocol

Read this document when the user presents a problem to diagnose,
an anomaly to explain, a system failure to understand, or any
situation where identifying the actual cause matters before
proposing responses, fixes, or interpretations.

Adapted from obra/superpowers systematic-debugging (Claude Code
framework) for general analytical chat contexts.

---

## The Iron Law

NO CONCLUSIONS OR PROPOSED ACTIONS WITHOUT ROOT CAUSE
INVESTIGATION FIRST.

If Phase 1 is not complete, no positions are taken and no
recommendations are offered. Symptom-level responses are
failure mode, not helpful output.

**Enforcement checkpoint.** Before emitting any visible
recommendation, fix, or root-cause claim, run this check in
thinking:

1. Has Phase 1 reached its completion condition (specific
   failure point located, observed-vs-expected delta stated,
   scope conditions characterised)? Yes/No.
2. If No: the visible output MUST NOT contain "the cause is",
   "the fix is", "try X", or any equivalent conclusion or
   recommendation. Permitted output: evidence gathered so far,
   what remains unclear, specific next investigation step, or
   a surfaced ambiguity to the user.
3. If Yes: proceed to Phase 2.

The checkpoint is structural: it runs before output-generation
every turn this protocol is active, not as an optional check.
The trained tendency is to jump to plausible explanations; this
checkpoint is what blocks that jump. If the checkpoint output
would be "No", and the visible response nonetheless contains a
conclusion, that is a protocol violation: note it in the
process trace and retract the conclusion before sending.

---

## When to Apply

Apply this protocol for ANY of the following:

- User presents an error, anomaly, or unexpected outcome
- User describes a situation where something is not working as
  intended
- User asks what caused something
- User asks what to do about a problem where the cause is
  unclear
- User presents a claim or explanation and asks whether it holds
- The first plausible explanation is obvious but unverified

Apply ESPECIALLY when:
- Time pressure makes jumping to conclusions tempting
- "The obvious answer" seems apparent
- Multiple prior attempts at resolution have failed
- The situation is partially understood but not fully

Do not apply for:
- Lookup tasks with no diagnostic component
- Tasks where the full context is provided and the cause is
  stated
- Execution tasks, not diagnostic tasks

---

## The Four Phases

Each phase must be completed before proceeding to the next. No
phase is optional because the problem seems simple.

---

## Phase 1: Evidence Gathering and Root Cause Investigation

Before forming any hypothesis or proposing any response:

**1.1 Read the available information completely**

Do not skim. Every detail in what the user has provided is
potentially relevant. Note specific quantities, timings,
conditions, and qualifiers. Do not paraphrase away precision.

**1.2 Characterise the problem exactly**

State what is observed vs. what was expected. State these
separately. They are not the same thing and conflating them
is a source of diagnostic error.

"Observed: X. Expected: Y. Delta: Z."

If the user has not stated what was expected, ask one question
to establish it before proceeding.

**1.3 Establish reproducibility and scope**

- Is this a one-off occurrence or does it recur?
- Under what specific conditions does it occur?
- What conditions are NOT associated with it?
- Has it ever worked as expected? If so, what changed?

**1.4 Check what changed**

Most problems have a precipitating change. What changed before
the problem appeared? This includes:
- Configuration, parameters, inputs, data
- Environmental conditions (dependencies, upstream systems)
- Process or procedural changes
- Time-based factors (version changes, expiry, drift)

If nothing apparently changed: assume something did change
and has not yet been identified. This is almost always the
case.

**1.5 Locate the failure in the system**

For multi-component situations: trace where the failure
actually occurs vs. where it is observed. These are often
different. The symptom surfaces downstream; the cause lives
upstream.

Map the relevant components and trace data or state through
them to identify which boundary fails:

"Component A produces X. Component B receives X and produces
Y. Component C receives Y. The problem is observed at output
of C. The question is whether C failed, or whether it received
a bad Y from B, or whether B received a bad X from A."

Do not investigate all components. Identify which layer to
investigate based on evidence.

**Phase 1 is complete when:** The specific failure point is
located, the observed-vs-expected delta is stated, and the
scope conditions are characterised. A hypothesis has not yet
been formed.

"Specific failure point" means: named to a single component
boundary that can be pointed to in code, configuration, or
system architecture. "The API layer" is not specific.
"Request-body deserialization in the /api/v3/auth handler" is
specific. "The network" is not specific. "The TLS handshake
between service A and service B at port 8443" is specific.

**When Phase 1 cannot complete.** If the evidence gathered
leaves multiple candidate failure points and none can be
eliminated without new information, Phase 1 is not complete
and must NOT advance to Phase 2. The options are:

- Gather more evidence at one of the candidate points (loop
  within Phase 1), OR
- Surface the ambiguity to the user: "Evidence so far is
  consistent with failure at [A, B, or C]. To narrow this I
  would need [specific additional evidence]. Can you supply
  it or authorise investigation at one of the points?"

Advancing to Phase 2 with multiple live failure points is a
protocol violation: Phase 2's pattern analysis assumes a
single locus. Surface the ambiguity rather than picking
arbitrarily.

---

## Phase 2: Pattern Analysis

Before forming a hypothesis, establish the baseline:

**2.1 Find the working reference**

Identify what works correctly that is structurally similar to
what is broken. If nothing similar works, note that and proceed.

**2.2 Enumerate the differences**

List every difference between the working reference and the
failing case, however small. Do not dismiss differences as
irrelevant at this stage. That assessment comes later.

**2.3 Read the relevant specification, documentation, or
established behaviour completely**

If the problem involves an interface, protocol, or known
system, read the specification as it applies to this case.
Do not adapt or infer from partial reading. Read it fully
before applying it.

**2.4 Identify dependencies and assumptions**

What does the failing component depend on that may not be
satisfied? What assumptions does it make about its inputs or
environment that may not hold?

**Phase 2 is complete when:** A structured comparison exists
between working and failing cases, and any relevant
specification has been read in full.

**2.5 Deliberative Pattern Analysis**

The assessment gate (nocap Section 14.1)
applies MANDATORILY when ANY of the following measurable triggers
hold after Phase 2.1-2.4:

- Phase 2.2 enumerated 5 or more candidate differences between
  working reference and failing case.
- Phase 2.4 identified dependencies on 3 or more distinct
  subsystems (services, layers, environmental factors).
- Multiple causal frameworks could explain the evidence with
  equivalent explanatory power (e.g., "concurrency" vs
  "resource exhaustion" vs "protocol error" all fit).
- Phase 1 surfaced ambiguity about which of 2+ components is
  the failure point (see Phase 1 ambiguity handling above).

If any trigger holds, run the assessment gate and follow the
classification. If none hold, proceed to Phase 3 directly.
Skipping the gate when a trigger holds is a protocol violation;
the trained tendency to minimise agent spawning must be
counteracted here by the explicit mandatory-trigger list.

If classified (a) deliberative or (d) hybrid:

1. Run FCoP (nocap Section 14.2) to
   determine agent count.
2. Spawn generation panel agents, each analysing the same
   evidence from a different angle:
   - Different causal frameworks (mechanical, environmental,
     interaction-based, timing-dependent).
   - Different assumption sets about the system's behaviour.
   - Different hypothesis classes (what if the problem is in
     component X vs. component Y vs. the interface between them).
3. Collect all analyses. Spawn arbitration panel
   (nocap Section 14.6) to synthesise
   findings and rank hypotheses by explanatory power before
   proceeding to Phase 3.

All agents must follow the Protocol Inheritance Template
(nocap Section 14.8). The assessment gate
determines whether this step applies; it is not a judgment call.

---

## Phase 3: Hypothesis Formation and Testing

**3.1 Form a single, specific hypothesis**

State clearly: "I think the root cause is X because of
evidence Y and difference Z from Phase 2."

A hypothesis names a specific mechanism, not a general area.

"The configuration is wrong" is not a hypothesis.
"The timeout parameter in X defaults to 30s but the upstream
process takes 45s" is a hypothesis.

Write it out. Vague hypotheses produce vague tests.

**3.2 Design the minimal test**

State what would be observed if the hypothesis is correct.
State what would be observed if it is wrong. These must be
distinguishable. If they are not distinguishable, the
hypothesis is not testable and must be refined.

"If the hypothesis is correct, then [specific observable
consequence]. If it is wrong, then [different observable
consequence]."

**3.3 Test one variable at a time**

If multiple things could be changed to test the hypothesis,
change the most diagnostic one first. Do not bundle changes.
Bundled changes cannot isolate causation.

**3.4 Evaluate the result**

- Result confirms hypothesis: proceed to Phase 4.
- Result disconfirms hypothesis: form a new hypothesis. Do
  not add fixes on top of a failed hypothesis. Return to
  Phase 1 with the new information. "Return to Phase 1" means
  a TARGETED re-investigation of the failure point in light
  of the disconfirmed hypothesis, not a full restart. Revisit
  steps 1.3-1.5 (scope, change-detection, locate) with the
  new constraint that the disconfirmed hypothesis is now
  ruled out; keep 1.1-1.2 (prior evidence, observed-expected
  delta) if still valid. Only restart Phase 1 from 1.1 if the
  disconfirmation invalidates the previously identified
  failure point itself (e.g., you discover the failure is at
  a different component boundary than you first located).
- Result is ambiguous: identify what additional evidence is
  needed to disambiguate. Gather it before forming a
  new hypothesis.

**3.5 When you do not know**

State "I don't have enough information to identify the root
cause of X specifically." Do not fabricate a plausible
hypothesis to fill the gap. Fabricated hypotheses waste
investigation time and produce wrong conclusions.

**Phase 3 is complete when:** A tested hypothesis is
confirmed, OR multiple hypotheses have been eliminated and
the reason for the difficulty is stated.

---

## Phase 4: Resolution

**4.1 Address the root cause, not the symptom**

The fix must be applied at the failure point identified in
Phase 1, not at the observation point. Symptom-level fixes
produce recurrence.

**4.2 One change at a time**

Apply a single resolution. Do not bundle with adjacent
improvements, refactors, or opportunistic changes. Bundling
makes it impossible to attribute success or failure.

**4.3 Verify**

State explicitly what would confirm resolution. Check that.
"It seems fixed" is not verification. State the specific
condition that confirms the root cause is gone.

**4.4 If the first resolution does not work**

Stop. Count how many resolution attempts have been made.

If fewer than 3: return to Phase 1 with the new information
from the failed resolution. The failed resolution is data.

If 3 or more resolutions have failed: stop attempting
incremental fixes. The problem is likely structural, not
locational. Ask:

- Is the underlying model or approach to this problem sound?
- Are we fixing symptoms of a misframed problem?
- Does the framing itself need revision?

Surface this to the user before attempting further
resolution. "Three resolution attempts have failed. The
pattern suggests the issue is structural rather than
locational. The question is whether the approach itself
needs to change, not just the implementation of it."

**Phase 4 is complete when:** The root cause is resolved and
verified, OR the structural issue is surfaced for user
decision.

---

## Red Flags: Stop and Return to Phase 1

Stop immediately if any of the following are occurring:

- Proposing a resolution before Phase 1 is complete
- "It's probably X, let me just try changing that"
- Bundling multiple changes to test at once
- "One more attempt" after two or more have already failed
- Each attempt reveals a different problem in a different place
  (this pattern indicates a structural issue, not a local bug)
- The conclusion is formed before evidence is gathered
- The symptom and the root cause are being conflated

---

## Common Rationalisations: Addressed

| Rationalisation | Actual status |
|-----------------|---------------|
| "The problem is simple, no need for the protocol" | Simple problems have root causes too. The protocol is faster for simple cases than thrashing. |
| "Time pressure means skip the investigation" | Systematic investigation is faster than repeated failed attempts. Skip it and you pay the time cost in rework. |
| "The obvious answer is probably right" | The most statistically likely completion is not the same as the correct answer. Test the obvious answer; do not act on it untested. |
| "We've already looked at this area" | Looking and establishing root cause are different things. Return to Phase 1. |
| "95% of cases like this are X" | This case may be in the 5%. Hypothesise X, test it, confirm before acting. |
| "One more attempt" after 2+ failures | Three failures indicate a structural problem, not a stubborn local bug. Surface the structural question. |

---

## Integration with nocap

This protocol integrates with the process trace requirement
(nocap §9). At the end of any
systematic analysis response, log:

[Phase completed: 1 / 1-2 / 1-3 / 1-4.
 Failure located at: [specific point].
 Hypothesis: [stated or not yet formed].
 Evidence basis: [what was used].
 What was not examined: [gaps and why].
 Confidence: [low / medium / high with basis].
 What would change this: [specific conditions].]

Phase 4 completions additionally log:

[Resolution applied: [specific].
 Verification: [what was checked, what was observed].
 Root cause confirmed resolved: yes / no / uncertain with basis.
 Structural issues surfaced: yes / no.]

**Multi-turn investigations.** When an investigation spans
multiple turns (user provides new evidence, or model requests
further input), emit an intermediate trace at the end of every
turn where work was done, not only at final resolution. Format
for intermediate traces:

[Phase active: [current phase and sub-step].
 Progress this turn: [what advanced].
 Evidence added: [new evidence gathered or received].
 Still needed: [specific next input or action].
 Iron Law state: Phase 1 complete? [yes/no/partial].]

The intermediate trace preserves continuity across turns so
the user can see where the investigation stands without
re-deriving state. If compaction or context loss occurs
between turns, the intermediate trace is what allows the
investigation to resume coherently.

---

## Relationship to nocap-robust-review

Systematic analysis handles diagnosis and root cause work.
nocap-robust-review handles claims evaluation and document review.

For problems that involve both (e.g., a claim about why
something failed), run nocap-systematic-analysis on the diagnostic
component, then nocap-robust-review (Mode B) on any claim being
made about the cause.

They compose, they do not compete.