---
name:        behavioral-equivalence
description: "Proves the new stack output is identical to the old stack output across four techniques — shadow mode, contract snapshots, differential fuzzing, and database read equivalence — before traffic is permanently cut over."
metadata:
  phase:           5
  source_stack:    "Any — technique-agnostic"
  target_stack:    "Any — technique-agnostic"
  effort_estimate: L
  last_updated:    2026-04-04
---

# 1. Purpose

This skill systematically proves that the new stack produces identical outputs to the old
stack for all meaningful inputs before any traffic is permanently cut over. "Identical" is
defined precisely: same HTTP status codes, same response body shapes (including field order
in JSON arrays), same response headers excluding instrumentation-only headers, and same
database state after the same mutation inputs. Four complementary techniques are applied
in combination, because no single technique provides complete coverage: shadow mode exercises
real production traffic but is biased toward happy paths that users actually send; contract
snapshot testing covers every response variant the old stack was observed to produce but only
for inputs a human thought to record; differential fuzzing finds crashes and divergences on
inputs that no human anticipated; and database read equivalence proves that the new stack's
data access layer returns the same rows as the old one for the same queries, even when the
response serialization layer obscures the difference. The skill is not complete until all four
applicable techniques report zero divergence — a partial pass is not a pass. The rollback
trigger is explicit and automatic: a shadow mode divergence rate exceeding 0.1% over any
one-hour window on production traffic is a hard stop, not a warning.

---

# 2. Trigger Conditions

**Use when:**
- At least one migration unit has completed Phase 4 and its per-route contract tests (from the Phase 4 skill) pass on the current commit.
- Both old and new stacks are deployed simultaneously in staging and both respond to health checks on their respective URLs.
- The strangler-fig proxy from Phase 3 is configured to route or mirror the same request to both stacks — shadow mode is impossible without this.
- `AUDIT.md` from Phase 1 is available with performance baselines (p50/p99 latency, bundle size) — needed to evaluate whether the new stack regresses on non-functional requirements.
- The `test-coverage-baseline` skill (Phase 3) has been run for `migration_unit` — the golden fixtures in this skill extend that baseline, and without it there is no committed "before" state to compare against.

**Do NOT use when:**
- Phase 4 contract tests for `migration_unit` are still failing — behavioral equivalence cannot be proven on top of broken contracts. Fix contracts first.
- Only one stack is available in staging — shadow mode requires simultaneous operation; without it, only snapshot testing is possible, which is insufficient as a standalone gate for cut-over.
- The migration unit is still partially migrated (some routes on old stack, some on new) — validate a complete unit boundary, not a partial one. An incomplete boundary produces divergences that cannot be attributed to migration failures vs. routing gaps.
- The divergence threshold has already been exceeded in a previous run of this skill and the root cause has not been resolved — re-running without fixing the divergence inflates the measurement window and conceals the true divergence rate.

---

# 3. Inputs

**Required:**

| Input | Type | Description |
|-------|------|-------------|
| `migration_unit` | string | Identifier of the unit being validated — matches the `--unit` flag used in the Phase 4 skill (e.g., `auth-service`, `api/users`). Used to scope fixture paths and name output artifacts. |
| `repo_root` | file-path | Absolute path to the repository root. All commands run from here. |
| `old_stack_url` | string | Base URL of the old stack in staging (e.g., `http://staging-old.internal:3000`). Must be reachable from the machine running this skill. |
| `new_stack_url` | string | Base URL of the new stack in staging (e.g., `http://staging-new.internal:3001`). Must be reachable and serving the migrated `migration_unit`. |
| `golden_fixtures_dir` | file-path | Repo-root-relative directory for golden response fixtures (e.g., `tests/golden/<migration_unit>/`). Created if absent. Fixtures are committed to the repo. |
| `proxy_config` | file-path | Path to the strangler-fig proxy configuration (nginx, in-process, or Envoy). Read in Step 5 to configure request mirroring for shadow mode. |

**Optional:**

| Input | Type | Default | Description |
|-------|------|---------|-------------|
| `shadow_duration_hours` | integer | `1` | Hours to run shadow mode before reading the divergence rate. The Done Criteria require a 24-hour clean run; use 1 for iterative validation during development. |
| `divergence_threshold_pct` | float | `0.1` | Maximum acceptable divergence rate (percent of requests with any diff). Exceeding this threshold over any one-hour window triggers rollback. Lower for high-risk units (auth, payments). |
| `fuzz_rounds` | integer | `1000` | Number of randomized input combinations to generate per route in differential fuzzing. Raise to 5000 for routes that handle financial data or session state. |
| `fuzz_seed` | integer | auto | Seed for the random input generator. Set explicitly to reproduce a previous fuzzing run. Printed to the migration log on every run. |
| `ignore_headers` | string | `x-request-id,x-trace-id,date,server,x-powered-by` | Comma-separated response headers excluded from diff comparisons. These legitimately differ between stacks (timestamps, tracing IDs, framework banners). Do not add business-logic headers here. |
| `db_comparison_queries` | file-path | `none` | Repo-root-relative path to a file of read queries (SQL or ORM query strings) for database read equivalence testing. Required if `migration_unit` has database read paths not covered by HTTP responses. Skip (`none`) for stateless API units. |
| `techniques` | string | `shadow,snapshots,fuzzing,db` | Comma-separated list of techniques to run. Override when a technique is inapplicable (e.g., `shadow,snapshots` for a background-worker unit with no HTTP surface). See the technique decision table in Step 1. |

<!--
  STOP CONDITIONS:
  - If `old_stack_url` is unreachable (curl exits non-zero), halt:
    "Old stack at <old_stack_url> is not responding. Behavioral equivalence requires both
     stacks to be live simultaneously. Confirm staging is deployed before running this skill."
  - If `new_stack_url` is unreachable, halt with the same message for the new stack URL.
  - If the Phase 4 contract test suite for `migration_unit` does not exist or exits non-zero, halt:
    "Phase 4 contract tests for <migration_unit> are failing or absent. Run the Phase 4
     migration skill and confirm contracts pass before validating equivalence."
-->

---

# 4. Steps

1. Read the technique decision table below. Cross-reference with the characteristics of `migration_unit` to determine which of the four techniques apply. Record the selected techniques in the migration log. If `techniques` input overrides the table, log the override and reason.

   **Technique Decision Table**

   | Characteristic of `migration_unit` | Shadow Mode | Contract Snapshots | Differential Fuzzing | DB Read Equivalence |
   |-------------------------------------|-------------|-------------------|---------------------|---------------------|
   | HTTP API routes (any method) | **Required** | **Required** | Recommended | If routes read from DB |
   | Write routes (POST / PUT / PATCH / DELETE) | **Required** | **Required** | **Required** | **Required** |
   | Auth / session routes | **Required** | **Required** | **Required** | **Required** |
   | Read-only routes (GET / HEAD) | **Required** | **Required** | Optional | If routes read from DB |
   | Background workers / queue consumers | Not applicable | **Required** (message output shape) | Recommended | If worker reads/writes DB |
   | Database migration layer only | Not applicable | Not applicable | Recommended | **Required** |
   | Frontend bundle / SSR routes | Recommended | **Required** (HTML snapshots) | Not applicable | Not applicable |
   | gRPC / WebSocket surface | **Required** | **Required** | Recommended | If applicable |

   Any technique marked **Required** for a characteristic that applies to `migration_unit` must be run. Optional techniques are skipped only if `migration_unit` has no inputs that could exercise them.

2. Capture **Golden Fixtures** for Contract Snapshot Testing. For every route in `migration_unit`, send at least four representative requests against `old_stack_url` and record the full response:
   - **Happy path**: a valid input that produces a 2xx response.
   - **Validation error**: an invalid input that produces a 4xx response with an error body.
   - **Auth failure**: a request missing or carrying a bad auth credential (401 or 403).
   - **Edge case**: a boundary input — empty body, maximum-length field, zero/negative numeric value, Unicode in string fields, null optional fields.

   For each request, capture and write to `<golden_fixtures_dir>/<route-slug>-<case-name>.json`:
   ```json
   {
     "request": {
       "method": "POST",
       "path": "/api/users",
       "headers": { "content-type": "application/json", "authorization": "Bearer <redacted>" },
       "body": { "email": "alice@example.com", "name": "Alice" }
     },
     "response": {
       "status": 201,
       "headers": {
         "content-type": "application/json; charset=utf-8",
         "cache-control": "no-store"
       },
       "body": { "id": "user_abc123", "email": "alice@example.com", "name": "Alice" }
     },
     "captured_at": "2026-04-04T12:00:00Z",
     "stack": "old",
     "migration_unit": "<migration_unit>"
   }
   ```

   **Header capture rules:**
   - Include every header the old stack sets, including `content-type`, `cache-control`, `location` (on redirects), `set-cookie` (sanitize the value — record the attribute structure `HttpOnly; Secure; SameSite=Strict; Max-Age=3600`, not the session value itself), and any custom business headers (`x-ratelimit-remaining`, `x-correlation-id` if deterministic).
   - Exclude headers in `ignore_headers`.
   - Do not exclude a header because it "seems unimportant" — if the old stack sets it, the new stack must set it identically or the exclusion must be documented.

   **Array ordering rule:** For response bodies containing JSON arrays, record the exact order returned by the old stack. If the order is non-deterministic (e.g., a `SELECT` without `ORDER BY`), add an `"array_order": "non-deterministic"` annotation to the fixture and compare only the set of elements, not their order. Document non-deterministic arrays in the migration log.

   Run: `npx jest tests/golden/<migration_unit>/ --updateSnapshot` or equivalent for the language's snapshot runner. Commit the fixtures.

3. Assert **Contract Snapshots** against the new stack. For each fixture in `<golden_fixtures_dir>`:
   - Replay the recorded `request` against `new_stack_url`.
   - Compare the actual response to the recorded `response` field.
   - Compare: status code (exact), body (deep equal, field-order-sensitive for arrays unless annotated `non-deterministic`), headers excluding `ignore_headers` (exact value match on each recorded header key).
   - Write each comparison result — pass or fail with a diff — to `output/behavioral-equivalence-snapshots-<timestamp>.md`.

   For each failing fixture:
   - Classify the divergence: `status-code-diff` | `body-field-missing` | `body-field-value-diff` | `body-array-order-diff` | `header-diff` | `extra-field-in-new` | `crash` (5xx where old returned non-5xx).
   - `extra-field-in-new` is **not** automatically acceptable — an extra field in the new stack's response changes the API contract for consumers who fail-fast on unknown fields. Get explicit sign-off before treating it as non-breaking.
   - `crash` classification is a hard stop: STOP the skill, record the crashing request in the migration log as a critical finding, and do not proceed to shadow mode until it is resolved.

4. Run **Differential Fuzzing**. For each route in `migration_unit`, generate `fuzz_rounds` randomized inputs and send each to both `old_stack_url` and `new_stack_url`. Compare the responses from both stacks.

   The fuzzing goal is not to find crashes in isolation — it is to find inputs where the two stacks *disagree*. A 500 from both stacks on the same input is not a divergence (both are broken consistently). A 200 from the old stack and a 500 from the new stack on the same input is a critical divergence.

   **JavaScript / TypeScript (fast-check):**
   ```typescript
   // tests/fuzz/<migration_unit>/<route-slug>.fuzz.test.ts
   import fc from 'fast-check';
   import axios from 'axios';

   const OLD = process.env.OLD_STACK_URL!;
   const NEW = process.env.NEW_STACK_URL!;

   describe('Differential fuzz: POST /api/users', () => {
     it('produces identical responses for any valid-schema input', async () => {
       await fc.assert(
         fc.asyncProperty(
           // Arbitraries: generate the full input space including edge cases
           fc.record({
             email: fc.emailAddress(),
             name:  fc.string({ minLength: 0, maxLength: 255 }),
             // Add fields for every optional parameter the route accepts
           }),
           async (body) => {
             const [oldRes, newRes] = await Promise.all([
               axios.post(`${OLD}/api/users`, body, {
                 headers:          { 'content-type': 'application/json', authorization: 'Bearer test-token' },
                 validateStatus:   () => true,   // never throw on 4xx/5xx
                 timeout:          5000,
               }),
               axios.post(`${NEW}/api/users`, body, {
                 headers:          { 'content-type': 'application/json', authorization: 'Bearer test-token' },
                 validateStatus:   () => true,
                 timeout:          5000,
               }),
             ]);

             // Both stacks must agree on status code.
             if (oldRes.status !== newRes.status) {
               throw new Error(
                 `Status divergence on input ${JSON.stringify(body)}: ` +
                 `old=${oldRes.status} new=${newRes.status}`
               );
             }

             // For 2xx: body shapes must match (ignore id fields that are inherently unique).
             if (oldRes.status >= 200 && oldRes.status < 300) {
               const oldBody = { ...oldRes.data, id: undefined };
               const newBody = { ...newRes.data, id: undefined };
               if (JSON.stringify(oldBody) !== JSON.stringify(newBody)) {
                 throw new Error(
                   `Body divergence on input ${JSON.stringify(body)}: ` +
                   `old=${JSON.stringify(oldBody)} new=${JSON.stringify(newBody)}`
                 );
               }
             }
           }
         ),
         { numRuns: <fuzz_rounds>, seed: <fuzz_seed>, verbose: true }
       );
     });
   });
   ```

   **Python (Hypothesis):**
   ```python
   # tests/fuzz/test_<route_slug>_fuzz.py
   import pytest
   import requests
   from hypothesis import given, settings, HealthCheck
   from hypothesis import strategies as st

   OLD = os.environ['OLD_STACK_URL']
   NEW = os.environ['NEW_STACK_URL']

   @given(
       email=st.emails(),
       name=st.text(min_size=0, max_size=255),
   )
   @settings(
       max_examples=<fuzz_rounds>,
       suppress_health_check=[HealthCheck.too_slow],
   )
   def test_differential_post_users(email, name):
       body = {'email': email, 'name': name}
       headers = {'Authorization': 'Bearer test-token'}

       old_r = requests.post(f'{OLD}/api/users', json=body, headers=headers)
       new_r = requests.post(f'{NEW}/api/users', json=body, headers=headers)

       assert old_r.status_code == new_r.status_code, (
           f'Status divergence: old={old_r.status_code} new={new_r.status_code} body={body}'
       )

       if old_r.status_code < 300:
           old_body = {k: v for k, v in old_r.json().items() if k != 'id'}
           new_body = {k: v for k, v in new_r.json().items() if k != 'id'}
           assert old_body == new_body, (
               f'Body divergence: old={old_body} new={new_body} body={body}'
           )
   ```

   **Fuzzing exclusion rules** — exclude from comparison (and document exclusions):
   - Fields whose values are inherently unique per-request: `id`, `createdAt`, `traceId`, `requestId`. Strip before comparing.
   - Fields in `ignore_headers` on the response.
   - Do NOT exclude error message strings — divergent error messages are a divergence. The new stack must produce the same error text as the old stack for the same invalid input, or the change is documented and accepted as intentional.

   Write a summary of fuzzing results to `output/behavioral-equivalence-fuzz-<timestamp>.md`: total rounds, number of divergences found, number of crashes (5xx on new stack where old returned non-5xx), and the minimized failing input for each divergence (fast-check and Hypothesis both shrink failing inputs automatically).

5. Configure and run **Shadow Mode**. Shadow mode sends every real request to both stacks simultaneously; only the old stack's response is returned to the client. The new stack's response is logged and compared asynchronously.

   Read `proxy_config`. Add or enable the mirror directive:

   **nginx mirror (in-process HTTP shadow):**
   ```nginx
   # skills/05-validate/behavioral-equivalence — shadow mode config
   # Add inside the server block that handles <migration_unit> routes.
   # Remove after shadow mode window closes — do not leave in prod permanently.
   location /<migration_unit_path>/ {
     proxy_pass        http://old-stack:3000;   # Primary — client receives this response
     mirror            /_shadow/;
     mirror_request_body on;
   }

   location /_shadow/ {
     internal;
     proxy_pass http://new-stack:3001;          # Shadow — response is discarded by nginx
     # Capture shadow responses via access log with a custom format (see below)
   }

   log_format shadow_diff
     '$remote_addr "$request" '
     'old_status=$status old_body_bytes=$body_bytes_sent '
     'shadow_status=$upstream_status shadow_time=$upstream_response_time';

   access_log /var/log/nginx/shadow.log shadow_diff;
   ```

   **In-process shadow middleware (Node.js — use when nginx is unavailable):**
   ```typescript
   // src/middleware/shadow.ts
   // Insert after the primary proxy, before the response is sent to client.
   import axios from 'axios';

   export function shadowMiddleware(newStackUrl: string, ignoreHeaders: string[]) {
     return async (req: Request, res: Response, next: NextFunction) => {
       // Fire-and-forget — do not await; must not block the client response.
       setImmediate(async () => {
         try {
           const shadowRes = await axios({
             method:       req.method,
             url:          `${newStackUrl}${req.path}`,
             data:         req.body,
             headers:      { ...req.headers, host: new URL(newStackUrl).host },
             params:       req.query,
             validateStatus: () => true,
             timeout:      10_000,
           });
           // Log diff between res (already sent) and shadowRes for async comparison.
           shadowLogger.log({
             requestId:   req.headers['x-request-id'],
             path:        req.path,
             method:      req.method,
             oldStatus:   res.statusCode,
             newStatus:   shadowRes.status,
             oldBody:     res.locals.responseBody,   // captured by a response-capture middleware
             newBody:     shadowRes.data,
             diverged:    res.statusCode !== shadowRes.status,
             timestamp:   new Date().toISOString(),
           });
         } catch (err) {
           shadowLogger.error({ path: req.path, error: (err as Error).message });
         }
       });
       next();
     };
   }
   ```

   **Diffy (drop-in shadow proxy for any HTTP stack):**
   ```bash
   # Diffy runs as a sidecar proxy; it sends requests to both primary and candidate,
   # diffs the responses, and serves a UI at http://localhost:31900.
   docker run -p 31900:31900 -p 9000:9000 -p 9100:9100 \
     ghcr.io/opendiffy/diffy:latest \
     -candidate=http://new-stack:3001 \
     -master.primary=http://old-stack:3000 \
     -master.secondary=http://old-stack:3000 \
     -service.protocol=http \
     -serviceName=<migration_unit> \
     -allowHttpSideEffects=true
   ```

   Start shadow mode. Monitor for `shadow_duration_hours`. Every 15 minutes, compute the divergence rate:
   ```bash
   # From nginx shadow log
   total=$(grep -c 'shadow_status' /var/log/nginx/shadow.log)
   diverged=$(awk -F'old_status=|shadow_status=' '{if ($2 != $3) print}' /var/log/nginx/shadow.log | wc -l)
   rate=$(echo "scale=4; $diverged / $total * 100" | bc)
   echo "Divergence rate: ${rate}% ($diverged / $total)"
   ```

   **Rollback trigger:** If `rate` exceeds `divergence_threshold_pct` (default 0.1%) over any one-hour measurement window: immediately set `feature_flag_key = false` (or revert the proxy config), stop shadow mode, write the divergence log to `output/behavioral-equivalence-shadow-divergences-<timestamp>.jsonl`, and STOP with:
   > "Shadow mode rollback: divergence rate <N>% exceeded threshold <divergence_threshold_pct>% in the window <start>–<end>. New stack rolled back. Investigate divergence log before resuming."

   → Hand off to `equivalence-validator` (see Section 5. Agent Handoffs) to analyze the divergence log and categorize each divergence. Wait for the divergence report before deciding whether to fix and retry or escalate.

6. Run **Database Read Equivalence** (if `db_comparison_queries` ≠ `none` and `db` is in `techniques`). For each query in `db_comparison_queries`:
   - Execute the query against the old stack's database.
   - Execute the same query against the new stack's database (which should have been seeded from the same data or a snapshot).
   - Compare result sets row by row and column by column.

   ```bash
   # PostgreSQL example using psql diff
   OLD_DB_URL="postgres://old-staging/appdb"
   NEW_DB_URL="postgres://new-staging/appdb"
   QUERY_FILE="<db_comparison_queries>"

   while IFS= read -r query; do
     old_result=$(psql "$OLD_DB_URL" -c "$query" --csv --tuples-only 2>/dev/null | sort)
     new_result=$(psql "$NEW_DB_URL" -c "$query" --csv --tuples-only 2>/dev/null | sort)

     if diff <(echo "$old_result") <(echo "$new_result") > /dev/null; then
       echo "PASS: $query"
     else
       echo "FAIL: $query"
       diff <(echo "$old_result") <(echo "$new_result") >> output/behavioral-equivalence-db-<timestamp>.diff
     fi
   done < "$QUERY_FILE"
   ```

   Any row-level diff is a hard failure. Common causes:
   - ORM generates different SQL (different JOIN order produces different implicit sort).
   - New stack's migration changed a column's default value.
   - New stack's ORM eager-loads a relation the old stack lazy-loaded, producing extra columns.
   - Timezone handling difference (`timestamp` vs. `timestamptz`).

   Document the root cause for each diff in the migration log before resolving. Do not patch the query to hide the diff — the query is the contract.

7. Compute the final **divergence summary** across all four techniques. Write to `output/behavioral-equivalence-summary-<timestamp>.md`:

   ```markdown
   # Behavioral Equivalence Summary — <migration_unit>

   | Technique | Applied | Rounds / Duration | Divergences | Verdict |
   |-----------|---------|------------------|-------------|---------|
   | Contract Snapshots | Yes/No | N fixtures | N | PASS / FAIL |
   | Differential Fuzzing | Yes/No | N rounds | N | PASS / FAIL |
   | Shadow Mode | Yes/No | N hours | N (N%) | PASS / FAIL |
   | DB Read Equivalence | Yes/No | N queries | N | PASS / FAIL |

   **Overall Verdict:** PASS | FAIL | INDETERMINATE

   **Confidence:** High | Medium | Low
   **Assumptions:**
   1. ...
   ```

   **Verdict rules:**
   - `PASS`: all applied techniques show 0 divergences. Shadow mode ran for ≥ `shadow_duration_hours`.
   - `FAIL`: any applied technique shows ≥ 1 divergence.
   - `INDETERMINATE`: a technique could not be run to completion (timeout, infra failure, insufficient traffic for shadow mode). Treat as `FAIL` for cut-over gate purposes — do not proceed to Phase 6 on an INDETERMINATE verdict.

8. Write all outputs declared in Section 7. Run every Equivalence Test in Section 6 and record results in `output/behavioral-equivalence-equiv-<timestamp>.md`. Evaluate every item in Section 9 Done Criteria; report pass/fail inline, then print the final verdict.

---

# 5. Agent Handoffs

## equivalence-validator

- **File:** `agents/equivalence-validator.md`
- **Triggered by:** Step 5 (shadow mode divergence log analysis)
- **Prompt template:**
  ```
  ORIGINAL:    Old stack serving <migration_unit> at <old_stack_url>
  MIGRATED:    New stack serving <migration_unit> at <new_stack_url>
  REPO_ROOT:   <repo_root>
  TEST_SUITE:  tests/golden/<migration_unit>/
  OUTPUT_FILE: output/behavioral-equivalence-divergence-report-<timestamp>.md
  ENV:         staging
  TASK:        Analyze the shadow mode divergence log at
               output/behavioral-equivalence-shadow-divergences-<timestamp>.jsonl.
               For each divergence entry, classify it as:
                 CRITICAL   — new stack returned 5xx where old returned non-5xx, or
                              missing required response field, or status code differs
                 REGRESSION — response body shape differs in a way that breaks consumers
                 COSMETIC   — header-only diff on a non-business header, or extra field
                              in new stack response with no consumer impact
               Group by classification and by root cause if determinable.
               Produce a VERDICT: PASS (zero CRITICAL or REGRESSION) |
                                  FAIL (any CRITICAL or REGRESSION) |
                                  INDETERMINATE (insufficient data to classify).
               List the top 3 most frequent divergences and the minimized request
               that reproduces each one.
  ```

---

# 6. Equivalence Tests

<!--
  This skill IS the equivalence validation layer for Phase 5.
  Section 6 here verifies that the validation process was run correctly and its outputs
  are trustworthy — not that two stacks agree (that is the skill's primary output).
-->

| Test Name | Input | Expected Output | Tool |
|-----------|-------|-----------------|------|
| `snapshots-all-pass` | `npx jest tests/golden/<migration_unit>/` (Step 3 result) | Exit code 0. Zero fixture failures. Any single fixture failure means a divergence exists on the new stack. | Bash — result captured in Step 3. |
| `fuzz-zero-divergence` | Fuzz run result at `output/behavioral-equivalence-fuzz-<timestamp>.md` (Step 4 result) | Zero divergences (`divergences: 0`) and zero crashes (`crashes: 0`) recorded. | Read — grep `divergences: 0` and `crashes: 0` in summary file. |
| `shadow-divergence-rate` | Shadow mode divergence rate computed in Step 5 over full `shadow_duration_hours` window | Rate < `divergence_threshold_pct` (default 0.1%) for the entire measurement window. Not just at the end — every 15-minute sub-window must be below threshold. | Bash: compute rate from shadow log file; verify no 15-min window exceeds threshold. |
| `db-read-zero-diff` | Diff file at `output/behavioral-equivalence-db-<timestamp>.diff` (Step 6 result) | File is empty — zero row-level differences across all queries in `db_comparison_queries`. Only checked if `db_comparison_queries` ≠ `none`. | Bash: `wc -c output/.../db-*.diff` must return 0. N/A if `db` not in `techniques`. |
| `verdict-not-indeterminate` | Summary file at `output/behavioral-equivalence-summary-<timestamp>.md`, `Overall Verdict` line | Verdict is `PASS` or `FAIL` — not `INDETERMINATE`. An INDETERMINATE verdict means the validation was incomplete; it must be resolved before this test can be evaluated. | Read — grep for `Overall Verdict: INDETERMINATE`; must return no match. |
| `divergence-report-produced` | `output/behavioral-equivalence-divergence-report-<timestamp>.md` (equivalence-validator output from Step 5) | File exists and contains a VERDICT line. Only required if shadow mode was run and logged divergences. | Read — if shadow log had zero entries, mark N/A. |
| `fixtures-committed` | `git log --oneline -- tests/golden/<migration_unit>/` | Golden fixture files appear in git history — they are committed as behavioral contracts, not ephemeral test data. | Bash |

---

# 7. Outputs

| Artifact | Path Pattern | Format | Description |
|----------|-------------|--------|-------------|
| Golden fixtures | `tests/golden/<migration_unit>/<route-slug>-<case-name>.json` | json | One file per route × test case combination captured from the old stack in Step 2. Committed to the repo; reviewed in PRs like code — a fixture change means behavior changed. |
| Snapshot test results | `output/behavioral-equivalence-snapshots-<timestamp>.md` | markdown | Per-fixture pass/fail with classification of any divergence (status-code-diff, body-field-missing, etc.). Consumed by the equivalence-validator and by Done Criteria. |
| Fuzz results | `output/behavioral-equivalence-fuzz-<timestamp>.md` | markdown | Total rounds, divergence count, crash count, and minimized failing inputs for each divergence. Consumed by Done Criteria gate `fuzz-zero-divergence`. |
| Shadow divergence log | `output/behavioral-equivalence-shadow-divergences-<timestamp>.jsonl` | jsonl | One JSON line per divergent request pair from shadow mode. Each entry: timestamp, request path/method, old status, new status, body diff hash, header diffs. Consumed by equivalence-validator and by Done Criteria. |
| Divergence report | `output/behavioral-equivalence-divergence-report-<timestamp>.md` | markdown | Produced by equivalence-validator. Classifies shadow mode divergences as CRITICAL / REGRESSION / COSMETIC, with a VERDICT. Only produced if shadow mode logged divergences. |
| DB read diff | `output/behavioral-equivalence-db-<timestamp>.diff` | diff | Raw diff output of old vs. new database query results. Empty file = zero divergences. Only produced if `db_comparison_queries` ≠ `none`. |
| Equivalence summary | `output/behavioral-equivalence-summary-<timestamp>.md` | markdown | Four-technique summary table with per-technique verdict and overall PASS / FAIL / INDETERMINATE verdict, confidence level, and assumptions list. The document that Phase 6 (Cut-over) reads as its go/no-go input. |
| Migration log | `output/behavioral-equivalence-log-<timestamp>.md` | markdown | Technique selection decisions (with decision table rationale), non-deterministic array annotations, excluded headers with justification, rollback events if any, fuzz seed used, confidence level, and numbered assumptions list. |
| Equivalence test results | `output/behavioral-equivalence-equiv-<timestamp>.md` | markdown | Pass/fail verdict for every row in Section 6. Required by Done Criteria. |

---

# 8. References

- `references/strangler-fig-pattern.md` — shadow mode is the "Shadow Mode" mechanism described in §3 of this document. The mirror directive and the two-stack coexistence requirement are the concrete implementation of the strangler-fig shadow phase. Read this before configuring nginx mirror or the in-process middleware.
- `references/migration-anti-patterns.md` — "Testing Only the Happy Path" (§3 extension): the fixture case matrix in Step 2 (happy, validation-error, auth-failure, edge-case) is the direct countermeasure. "Skipping Equivalence Validation" (§3) is why this skill exists as a hard gate before Phase 6.
- `skills/03-prepare/test-coverage-baseline/` — the golden fixtures in this skill extend the behavioral baseline established there. Without that baseline, there is no committed "before" state to compare against.
- `skills/01-audit/codebase-audit/` — `AUDIT.md` performance baselines (Section 5) are the reference numbers for evaluating whether the new stack regresses on p50/p99 latency during shadow mode.
- `skills/06-cut-over/` — reads `output/behavioral-equivalence-summary-<timestamp>.md` as its go/no-go input. The Overall Verdict must be `PASS` before Phase 6 begins.
- `agents/equivalence-validator.md` — invoked in Step 5; its VERDICT field in the divergence report is what the Done Criteria gate `divergence-report-produced` checks.
- `https://github.com/opendiffy/diffy` — Diffy, a drop-in shadow proxy that handles request mirroring and response diffing as a sidecar. Recommended when the old stack is not Node.js or when adding in-process middleware is too invasive.
- `https://fast-check.dev/` — fast-check documentation for the property-based fuzzing implementation in Step 4. The `record`, `emailAddress`, `string`, and `oneof` arbitraries cover most REST API input shapes.
- `https://hypothesis.readthedocs.io/` — Hypothesis documentation for the Python fuzzing implementation in Step 4. The `@given` + `@settings` pattern with `max_examples` maps directly to `fuzz_rounds`.

---

# 9. Done Criteria

<!--
  Claude evaluates each item and reports pass/fail before declaring this skill complete.
  Any unchecked item means Phase 6 (Cut-over) may NOT begin.
  This is the hardest gate in the migration lifecycle — its threshold is deliberately absolute.
-->

- [ ] Shadow mode ran for a minimum of 24 continuous hours on production traffic without a rollback event — the Done Criteria require 24h even though the skill can be run with `shadow_duration_hours=1` for iterative validation. Confirm by reading the shadow log's earliest and latest timestamps: `head -1` and `tail -1` of `output/behavioral-equivalence-shadow-divergences-<timestamp>.jsonl`.
- [ ] Shadow mode divergence rate is 0.0% over the full 24-hour window — not below `divergence_threshold_pct`, but precisely 0. The rollback threshold (0.1%) is for iterative validation; the Done Criteria require zero divergence for cut-over. Compute: `total divergences in shadow log / total requests mirrored × 100`.
- [ ] `snapshots-all-pass` equivalence test recorded as **pass** — all golden fixtures match the new stack's responses exactly, with zero divergences of any classification.
- [ ] `fuzz-zero-divergence` equivalence test recorded as **pass** — zero divergences and zero crashes across all `fuzz_rounds` × routes. If any divergence was found and fixed during the validation run, fuzzing must be re-run from scratch (same seed) after the fix to confirm.
- [ ] `db-read-zero-diff` equivalence test recorded as **pass** (or N/A if `db_comparison_queries` is `none` — N/A must be documented in the migration log with justification for skipping). A non-empty diff file is a hard fail even if the rows are "close."
- [ ] `verdict-not-indeterminate` equivalence test recorded as **pass** — the overall verdict in `output/behavioral-equivalence-summary-<timestamp>.md` is `PASS`.
- [ ] `fixtures-committed` equivalence test recorded as **pass** — golden fixture files are in git history and will survive a re-clone of the repo.
- [ ] The equivalence summary's **Confidence level is High** — Medium or Low confidence means the validation relied on too many assumptions (e.g., insufficient traffic for shadow mode, limited fuzzing rounds) and must be re-run with higher coverage before cut-over.
- [ ] No divergence is classified as CRITICAL or REGRESSION in the equivalence-validator divergence report — COSMETIC divergences are acceptable with explicit documentation. If the divergence report was not produced (shadow log had zero entries), this gate passes automatically.
- [ ] The migration log documents the technique selection decisions (which techniques were applied, which were skipped, and why) — verify by reading `output/behavioral-equivalence-log-<timestamp>.md` for the decision table row citations.
- [ ] All output files listed in Section 7 exist at their declared paths — verify each with a file read. DB diff file and divergence report are only required when their respective conditions apply.
- [ ] Every equivalence test in Section 6 has a recorded result in `output/behavioral-equivalence-equiv-<timestamp>.md` — N/A entries are acceptable for inapplicable techniques; absent entries are not.
- [ ] No equivalence test in Section 6 is recorded as **fail** — grep the results file for `fail`; N/A entries do not count as fail.
- [ ] The equivalence summary includes a confidence level (High / Medium / Low) — grep `output/behavioral-equivalence-summary-<timestamp>.md` for `Confidence:`.
- [ ] The migration log includes a numbered assumptions list — grep `output/behavioral-equivalence-log-<timestamp>.md` for `Assumptions:`.