---
name: audit-prompt
description: Check prompt files for quality issues — wasted tokens, poor positioning, vague instructions. Use when reviewing changes to commands, skills, agents, SKILL.md, or any .md containing LLM instructions.
disable-model-invocation: true
---

<objective>
Audit changed prompt-related files against @references/prompt-quality-guide.md. Validates commands, workflows, agents, skills, templates, references — any file containing LLM instructions.
</objective>

<context>
**Uncommitted changes (staged + unstaged):**
!`git diff HEAD --name-only`

**Untracked files:**
!`git ls-files --others --exclude-standard`

**Target files (from arguments):** $ARGUMENTS
</context>

<process>

1. **Identify files to audit:**
   - If `$ARGUMENTS` contains file paths, use those exclusively
   - If `$ARGUMENTS` is empty, combine uncommitted + untracked files from context. If both are empty, run `git show --name-only --format="" HEAD` to get files from the last commit
   - Filter to prompt-related files — any `.md` or `.yaml` that contains LLM instructions (slash commands, workflows, agent definitions, skills, templates, references, CLAUDE.md files). When unclear, check for XML tags, YAML frontmatter, or behavioral instructions
   - If no prompt-related files found, report "No prompt-related changes detected" and stop

2. **For each file, read full content.** For uncommitted files, also run `git diff HEAD -- <file>` to isolate what changed. Focus audit on changed sections but flag pre-existing issues only if they cause incorrect behavior (wrong tool invoked, skipped steps, misrouted logic) or major budget waste (>10 lines of pure fluff).

3. **Determine the target model class.** Before evaluating, identify what model will execute the prompt being audited. Check (in order): explicit model context from the user's `$ARGUMENTS`, model references in the file itself (model names, API endpoints, Ollama configs), or the surrounding codebase (e.g., a Python script calling a specific model). Classify as:
   - **Frontier** (Claude Opus/Sonnet, GPT-4o, Gemini Pro): Apply the guide's default principles — minimize, remove waste, start sparse.
   - **Small/local** (sub-10B: Qwen 4B, Phi-3 mini, Gemma 2B, Llama 3.2 3B, quantized variants): Read `references/small-model-guide.md` and apply its principles and audit behavior overrides instead of the main guide's frontier-oriented defaults.
   - **Unknown**: Default to frontier principles but flag the assumption — recommend the user verify with their target model.

   Also determine the **task type**. If the prompt performs text transformation or filtering (cleaning transcripts, reformatting documents, removing patterns while preserving content), apply the **Text Transformation and Filtering Tasks** section of the guide. Key checks:
   - Flag few-shot examples that contain the pattern being removed (priming risk)
   - Flag abstract filter categories ("remove filler words") — recommend explicit pattern lists with contrastive keep/remove pairs

4. **Evaluate against the quality guide.** Apply The Reliability Test to each instruction (using the small-model variant from the guide when applicable). Check against the Common Waste and Common Value tables. **XML boundary verification:** When an XML structural issue appears at the first or last line of Read output, verify the tag exists in the file with Grep before reporting — the Read tool's `</output>` framing is easily confused with file content in XML-heavy files. Map findings to these categories:
   - `Budget waste` → Common Waste table (fluff, filler, verbose restatements, unlikely negations)
   - `Positioning` → Positional Attention Bias (critical constraints buried in middle, success criteria ordering)
   - `Context efficiency` → Context Is a Shared, Depletable Resource + Progressive Disclosure (eager vs lazy loading)
   - `Specificity` → Specificity Over Abstraction + Patterns and Anti-Patterns (vague instructions, missing contrastive examples, abstract filter categories, few-shot priming risk in removal tasks)
   - `Structure` → project conventions (semantic XML tags, plan format, output format specs)

   **Classify content type before flagging for removal.** Before flagging content as "Budget waste" or "Context efficiency", classify it: reference data (tables, lookup lists, schemas), structural markers, behavioral instructions, or corrective rationale. Reference data has low interference per the guide — only flag if genuinely irrelevant to the execution context. When flagging a multi-instruction block, verify each item individually — a section can be 80% redundant while one instruction carries unique semantics. Extract the unique content, remove the rest.

   **Compression must preserve contrastive structure.** When recommending that enumerated lists be replaced with a shorter form, check whether the lists form contrastive pairs (positive vs negative examples). Preserve at least inline contrastive anchors in the compressed version — the contrast is the mechanism, not the volume.

   **Success criteria: prefer merging over removing.** Multi-step behaviors (ask user → act on answer), optional/conditional steps, and post-completion actions are inherently skip-prone. Don't remove them solely to hit a count target.

5. **Report per file:**

   ```
   ### path/to/file.md

   **N issues found**

   1. **[Category]** (line ~N): [specific issue]
      → [concrete fix: "change X to Y" or "remove this line"]

   2. ...
   ```

   Categories: `Budget waste` | `Positioning` | `Context efficiency` | `Specificity` | `Structure`

   Clean files: `**path/to/file.md** — Clean`

6. **Summary:**
   - Files audited: N
   - Findings by category (counts)
   - Top 3 highest-impact fixes

7. **Next steps:** After presenting the summary, use the `AskUserQuestion` tool to ask the user how to proceed. Offer these options:
   - **Fix top 3**: Apply only the top 3 highest-impact fixes. Minimal changes, lowest regression risk.
   - **Fix all issues**: Address every finding from the audit. More comprehensive but still generally safe.
   - **Report only**: No fixes needed — the user just wanted the audit.

   The tool's built-in free text input lets the user specify custom scope (e.g. pick specific issues, disagree with findings).

   Then apply the selected fixes directly to the files. For any removal spanning more than 3 lines, re-read the target section before deleting and verify every line is either duplicate or accounted for in the fix. If any line carries unique semantics, extract it to the appropriate location — don't silently drop it.

</process>

<success_criteria>
- [ ] Every finding cites a specific quality guide principle — no subjective opinions
- [ ] Suggestions are concrete ("change X to Y"), not vague ("could be improved")
- [ ] Valid patterns (peripheral reinforcement, corrective rationale, contrastive examples) not flagged as waste
- [ ] Success criteria removals justified by skip-risk assessment, not solely by count
- [ ] Merge recommendations verified to preserve all trigger patterns and corrective rationale
- [ ] Section removals verified per-instruction — no unique semantics lost in wholesale removal
</success_criteria>
