---
name: core-code-audit
description: Stack-agnostic code audit methodology -- tiers (Light/Full), interleaved validation loop, mandatory external validation triggers, repro gate for Critical findings, self-check before emitting output
user-invocable: false
---

## What I do

Guide audits of uncommitted code changes with validation interleaved into the analysis -- every suspicion is researched and cross-checked the moment it arises. No unverified claim ships. I own the *methodology*. Stack-specific hot-spots (what to look for in Python async code vs Vue reactivity vs SSE wire format) live in the `stack-*` skills loaded alongside.

Load me when an audit is needed on uncommitted changes, a pull request diff, or a scoped subset of files. Pair with a `stack-*` skill matching the detected language(s).

## Core principle: interleaved validation

Do NOT accumulate a list of suspicions and validate them later. The moment a possible problem appears, pause the walk and run the validation loop (below) on it. The suspicion only survives as a finding if it holds up. This replaces the old "review then validate" two-phase flow.

## Audit Tier (choose up front)

Pick the tier from the scope BEFORE starting -- it changes what "enough evidence" means. State the tier and a one-line justification in the Audit Summary.

### Light (code-citation is enough, external research optional)

- <= ~50 LoC changed
- No change to observable contracts (API response shape, SSE event shape, public types, CLI flags, DB migration, stored data shape)
- No security-sensitive surface (auth, input parsing, file I/O with user paths, crypto, token handling)
- Internal refactor, rename, comment, test-only change

### Full (external validation + repro attempt required for Critical findings)

- Anything that touches a contract (HTTP / SSE / public type / schema / CLI arg)
- Security, concurrency, error handling, or data-loss surfaces
- Runtime behavior of browser / library / framework APIs (ResizeObserver, asyncio, event loops, GC, reactivity, etc.)
- > ~50 LoC changed OR > 3 files touched

**When in doubt, escalate to Full.**

## Process

### 1. Context gathering

Run in parallel:
- `git status` -- changed files
- `git diff` -- unstaged
- `git diff --cached` -- staged

For untracked files in `git status`, read them directly (they do not appear in diffs).

Apply the scope filter (e.g. "frontend" -> only `frontend/**`). If scope matches no uncommitted change, say so and stop. Do not invent findings.

Determine the stack(s) from the files in scope and load the matching `stack-*` skill(s). Those supply the hot-spot checklists you apply during the walk.

### 2. Walk the changes, validating inline

Evaluate each file against these axes. The moment anything looks wrong, switch to the **Validation Loop** before continuing:

**Code quality**
- Self-documenting names, (rare) comments, consistent style
- DRY, SRP, reasonable function sizes, clear separation of concerns
- Type safety: strict types, no loose casts, correct return/error types
- Error handling: specific exceptions, edge cases, graceful degradation

**Architecture**
- Matches existing project structure and naming conventions
- Uses the same patterns as the rest of the codebase
- Respects layer boundaries
- Appropriate (not over-engineered) patterns; reasonable coupling

**Security**
- Injection (SQL / command / HTML), XSS, auth bypass, sensitive data exposure, hard-coded secrets, insecure deps
- Input validation at boundaries, output encoding, secure defaults, least privilege

**Performance**
- N+1 queries, unnecessary loops, large in-memory structures, blocking ops in async contexts, missed caching
- Avoid false positives: O(n) on small n is fine; no premature caching; no theoretical bottlenecks without evidence

**Dead code / backwards-compat**
- No unused imports / vars / functions / classes
- No commented-out code
- No unused parameters or branches for removed logic
- No backwards-compat shims or deprecated wrappers (unless the user explicitly asked for them)

**Stack hot-spots** -- the loaded `stack-*` skill has a targeted checklist for the language/framework. Apply it here.

### 3. Validation loop (per suspicion, before writing it down)

1. **Trace implications in the codebase**
   - Read callers, callees, surrounding module
   - Check whether the problem is already handled elsewhere (upstream guard, framework invariant, config default)
   - Confirm the pattern is genuinely inconsistent with the codebase, not just unfamiliar to you
   - If the claim is "X happens," verify X *can* happen given real call sites

2. **Consult external sources**
   - `context7` (`resolve-library-id` -> `query-docs`) for library / framework contracts and version-specific behavior
   - `WebSearch` for current best practices, third-party opinions, known pitfalls, consensus on similar situations
   - Prefer primary sources (official docs, library source) over blog posts when they conflict

   **MANDATORY when:**
   - The claim invokes runtime behavior of a browser / library / framework API (e.g. "the browser does not fire scroll when scrollTop is unchanged", "asyncio cancels children on...", "Pydantic validates X in order Y", "Vue loses reactivity when...")
   - The suspicion depends on a version-specific contract (deprecated API, behavior change between versions)
   - The finding would be classified as Critical -- no Critical ships on code-reading alone in Full tier

   **Code-citation-only is acceptable when:**
   - The claim is purely syntactic / structural logic internal to the patch ("branch unreachable because guard above returns", "import unused")
   - All callers of the affected symbol are in the patch or adjacent and visible
   - The behavior is fully determined by code in the repo, not by external runtime

3. **Attempt to reproduce (Critical findings only, Full tier)**
   - A minimal script / curl / console snippet that triggers the bug
   - Python: `uv run python -c "..."` or a targeted test
   - TS/JS: a short snippet in a Node or browser console; for frontend runtime, a contrived input fed into the pure function
   - If repro is impractical (full env / real data / timing race), say so explicitly and record the *assumed failure path* -- this becomes a known weakness of the finding, not a hidden one

4. **Classify**
   - **Real Problem** -- confirmed by documentation, adjacent code, or reproducible reasoning. Keep it.
   - **False Positive** -- dissolves under research. Drop it; optionally record in Dismissed with the evidence that killed it.
   - **Depends** -- real only under project-specific constraints. Flag the assumption explicitly in the finding.

5. **Reject overengineering**
   - If the "fix" adds complexity without a concrete payoff the user cares about, the suspicion does not survive.
   - "Could theoretically break under unusual conditions" is not enough -- show the realistic path.

Every finding that survives must cite evidence: which file/line confirmed the codebase implication, which doc/URL confirmed the external claim. If research is inconclusive, say so explicitly or drop the finding.

### 4. Propose fixes (Real Problems only)

Before writing the fix into the output:

1. **Search for all callers / consumers** of every symbol the fix touches (`Grep` for the name). Confirm the fix does not silently break an adjacent code path.
2. **Check for parallel uses** of the same pattern (e.g. if fixing an SSE wrapper, check every SSE consumer).
3. **Record side-effects investigated** in the finding under "Callers checked" so the reviewer sees what you considered.

Then:
- Simplest change that resolves the issue
- Minimal diff; no drive-by refactors or new dependencies
- Keep existing patterns; cite the source that justifies the fix
- Effort estimate:
  - Trivial: < 5 min, 1-2 lines
  - Small: 15-30 min, single file
  - Medium: 1-2 hours, multiple files
  - Large: half day+, architectural change

### 5. Self-check before emitting output

Before writing the final report, answer these. If any answer is "no / not done / unsure", validate now or downgrade the finding.

- For every Critical: did I reproduce the bug OR explicitly record that I couldn't?
- For every Critical whose root cause involves runtime/library behavior: did I consult docs or the web, not just the code?
- For every proposed fix: did I grep for callers / consumers of the symbols I am changing?
- For every claim of "X doesn't happen" or "Y always happens": is that backed by a specific source, or by my own assumption?
- Are there any suspicions I dropped silently without putting them in Dismissed?

Unvalidated claims that survive this step are labeled `(unverified)` in the output and downgraded from Critical to Suggestion.

### 6. Prioritization

- **Critical (Fix Now)**: security with exploit path; data loss / corruption; production-breaking bugs
- **Important (Fix Soon)**: perf issues at realistic scale; error handling that hides bugs; missing validation at trust boundaries
- **Nice-to-Have**: readability, minor optimizations, better naming
- **Dismiss**: false positives (with evidence), premature optimization, theoretical concerns without real impact

## Output format

```
## Audit Summary
Scope: <what was audited>
Tier: <Light | Full> -- <one-line justification>
Stack(s): <python | typescript-vue | mixed>
<2-3 sentence overview of changes and overall health>

## Validated Critical Issues
### <Issue Name>
- **Location**: file:line
- **Problem**: <what and why it matters>
- **Evidence**:
  - Codebase: <file:line that confirms the implication>
  - External: <doc URL / context7 result / advisory -- required in Full tier>
- **Repro**: <script / curl / console snippet that triggers it, OR "not reproduced -- assumed failure path: ...">
- **Callers checked**: <symbols grepped and where they appear; confirms fix scope>
- **Fix**: <minimal solution; snippet if helpful>
- **Effort**: <Trivial | Small | Medium | Large>

## Improvement Suggestions
### <Suggestion>
- **Location**: file:line
- **Current**: <what exists>
- **Proposed**: <better approach>
- **Benefit**: <why>

## Dismissed Concerns
- <Concern>: <what research / adjacent code killed it + source>

## Context-Dependent
### <Issue>
- **Depends On**: <factors>
- **If A**: <fix approach>
- **If B**: <leave as-is because ...>
- **Question for developer**: <what to clarify>

## Implementation Plan
1. <Fix 1> -- Est: X
2. <Fix 2> -- Est: X
Total: ~X
```

Omit empty sections. If there are no Critical issues, say so in the summary and skip that section.

## Tone

- **Constructive**: improvement-focused, not criticism
- **Educational**: explain WHY, not just WHAT
- **Collaborative**: offer options, discuss trade-offs
- **Pragmatic**: respect project context and scope
- **Honest**: if something is fine, say it's fine -- and if research is inconclusive, say that too
- **Scope-disciplined**: never report on files outside the requested scope, even if you notice something

## Anti-patterns

**Don't:**
- Nitpick style issues (linters handle that)
- Suggest enterprise patterns for simple scripts
- Recommend premature optimization
- Impose personal preferences as rules
- Overwhelm with too many low-value issues
- Emit findings without validation evidence
- Treat "obvious" as an excuse to skip verification -- use the tier system instead

**Do:**
- Focus on impactful, evidence-backed improvements
- Provide concrete code snippets
- Prioritize security and correctness over style
- Respect existing architecture decisions
- Acknowledge good work alongside issues

## Source trust hierarchy

**High trust**
- Official language / framework documentation
- CVE databases, OWASP, security advisories
- Framework maintainer posts / talks
- RFCs, language specifications

**Medium trust**
- Well-maintained open source projects
- Established tech blogs (check dates)
- Conference talks from known experts

**Low trust (verify independently)**
- Stack Overflow answers (check date and votes)
- Random blog posts
- AI-generated content
- "This worked for me" anecdotes

## Common false positives to watch for

**Performance**
- "Loop is slow" -> O(n) fine when n is small
- "Use caching" -> premature unless measured
- "Avoid regex" -> modern engines are fast for simple patterns
- "Too many DB queries" -> 3 is fine; 300 is not

**Security**
- "SQL injection" -> ORM / parameterized queries already safe
- "XSS" -> framework auto-escapes (Jinja2, Vue template interpolation)
- "Hard-coded secret" -> actually a public constant or default

**Architecture**
- "Violates DRY" -> intentional duplication for clarity
- "Needs abstraction" -> YAGNI for current scope
- "Should use pattern X" -> overkill for the problem size
- "Too many responsibilities" -> one clear workflow with sequential steps