--- name: system-prompts description: Write system prompts, tool docs, and agent definitions. Combines research-backed prompt engineering (+15-30% measured improvements) with project XML conventions. Covers tag hierarchy, structural templates, high-impact interventions, anti-patterns. --- # System Prompt Engineering Empirically-validated techniques + consistent XML structure. Every recommendation backed by benchmarks or production data. ## High-Impact Interventions (+15-30% measured improvement) 1. **Persistence**: "Keep going until fully resolved" — prevents premature termination 2. **Tool verification**: "Use tools to verify; do not guess" — reduces hallucination 3. **Planning**: "Plan approach before acting" — improves complex task success 4. **Context positioning**: Critical instructions at START and END — middle content degrades 20%+ 5. **Urgency framing**: "This matters" / "Get this right" — 8-115% improvement (EmotionPrompt) 6. **Edit format**: SEARCH/REPLACE beats line-numbers 3X on code generation **Minimal prompting wins.** Every instruction must justify its token cost. --- ## Tag Hierarchy Tags encode enforcement level. Use consistently throughout: | Tag | Enforcement | When to Use | | --------------- | ------------- | ---------------------------------------------------- | | `` | Inviolable | Safety constraints, must-follow rules, repeat at END | | `` | Forbidden | Actions that cause harm, never acceptable | | `` | High priority | Important to follow | | `` | Operational | How to use a tool, perform a task | | `` | Contextual | When rules apply, trigger criteria | | `` | Anti-patterns | What not to do, prefer alternatives | **Context positioning rule**: Place `` at START for immediate priming, repeat at END for recency. Middle content suffers 20%+ degradation in long contexts. --- ## Standard Tags ### Structure Tags ``` Agent identity and expertise (first element) Background, situation, audience Numbered step-by-step workflows Bulleted operating instructions Input specifications, types Return value documentation What the agent/tool excels at Available operations (for multi-op tools like LSP) ``` ### Special Tags ``` Core values, ultimate objectives Communication style, attitude What the agent commits to doing Domain-specific mindset/context Behavioral rules, tool precedence ``` ### Example Tags Always use `name` attribute with lowercase-kebab descriptive names: ```xml Clear, correct usage What to avoid — show the mistake explicitly Domain-specific example Platform/context-specific ``` Naming patterns: - `name="single"` / `name="multi-part"` — complexity variants - `name="good"` / `name="bad"` — correctness contrast - `name="linux"` / `name="windows-cmd"` — platform-specific - `name="create"` / `name="update"` / `name="delete"` — operation types --- ## Structural Templates ### Tool Documentation ```markdown # Tool Name One-line description of what the tool does. - How to use it (bulleted, imperative) - Key parameters and their effects - Common patterns What the tool returns. Include: - Success format - Truncation limits (e.g., "truncated at 50KB") - Error conditions Must-follow rules. Safety constraints. When to ALWAYS or NEVER use this tool. High-priority notes that aren't safety-critical. tool {"param": "value"} tool {"param": "value", "option": true} - Anti-pattern 1 — why it's bad - Anti-pattern 2 — what to do instead ``` ### Agent Definition ```markdown --- name: agent-name description: One-line for spawning UI (imperative: "Fast read-only codebase scout") tools: read, grep, find, bash model: pi/slow, gpt-5.2, codex output: properties: field_name: metadata: description: What this field contains type: string --- Senior [role] doing [task]. Your goal: [concrete outcome]. Inviolable constraints first. READ-ONLY if applicable — list prohibited actions explicitly. - What this agent excels at - Core capabilities - Operating instruction 1 - Operating instruction 2 - Spawn parallel tool calls wherever possible ## Phase 1: Understand 1. Step one 2. Step two ## Phase 2: Execute 1. Step one 2. Step two What to return. Schema requirements. Call `yield` with findings when done. Repeat critical constraints at end. Keep going until complete. This matters. ``` ### System Prompt (Main Agent) ```markdown XML tags in this prompt are system-level instructions. They are not suggestions. Tag hierarchy (by enforcement level): - `` — Inviolable. Failure to comply is a system failure. - `` — Forbidden. These actions will cause harm. - `` — High priority. Important to follow. - `` — How to operate. Follow precisely. - `` — When rules apply. Check before acting. - `` — Anti-patterns. Prefer alternatives. You are a [specific role with credentials]. Domain-specific context and mindset. What to notice, what traps exist. Communication style. Correctness over politeness. Brevity over ceremony. ## Tool Precedence Specialized tools → Python → Bash ... ## Verification External proof: tests, linters, type checks. ... ## Before action 1. CHECKPOINT — pause, assess parallelism 2. Plan if task has weight 3. State intent before each tool call

Core values. What ultimately matters.

Actions that cause harm. Repeat most important rules. Keep going until finished. The work is done when it is correct. ``` --- ## Writing Style ### Voice **Direct and imperative.** Research shows direct tone improves accuracy 4%+ over polite hedging. ``` Bad: "You might want to consider using..." Good: "Use X when Y." Bad: "It would be helpful if you could..." Good: "Do X." Bad: "Please note that this is important..." Good: "Critical: X." ``` **Urgency framing** (8-115% improvement): ``` "This matters. Get it right." "Be thorough." "Keep going until fully resolved." ``` ### Normative Language (RFC 2119) All prompt prose that prescribes behavior MUST use RFC 2119 key words in **full caps**. This removes ambiguity about whether an instruction is absolute or advisory. | Keyword | Meaning | Replaces | | --- | --- | --- | | **MUST** / **REQUIRED** | Absolute requirement | "always", "make sure", "ensure", "do" | | **MUST NOT** / **PROHIBITED** | Absolute prohibition | "never", "do not", "don't", "strictly prohibited" | | **SHOULD** / **RECOMMENDED** | Strong preference; deviation allowed with known tradeoffs | "prefer", "recommend", "it's best to" | | **SHOULD NOT** / **NOT RECOMMENDED** | Strong discouragement; deviation allowed with known tradeoffs | "avoid", "try not to" | | **MAY** / **OPTIONAL** | Truly optional | "can", "may", "you could" | ``` Bad: "Never edit from a grep snippet alone" Good: "You MUST NOT edit from a grep snippet alone" Bad: "Prefer unit tests over mocks" Good: "You SHOULD prefer unit tests over mocks" Bad: "Make sure to run lsp references before modifying a symbol" Good: "You MUST run lsp references before modifying any symbol" ``` **What not to convert**: factual/descriptive sentences (what a tool returns, what a parameter does), code blocks, examples, schema definitions, Handlebars template syntax. Only prescriptive prose gets RFC treatment. ### Positive Framing Models process "Always do Y" better than "Don't do X": ``` Bad: "Don't use grep via bash" Good: "ALWAYS use Grep tool for search—NEVER invoke grep via Bash" Bad: "Don't guess" Good: "Use tools to verify; do not guess" ``` When negation is necessary, pair with positive alternative. ### Specificity **Role specificity spectrum** (effectiveness increases →): ``` "You are a lawyer" ↓ "You are a corporate M&A lawyer" ↓ "You are General Counsel at a Fortune 500 tech company, 15 years in SaaS licensing" ``` **Constraint specificity**: ``` Bad: "Keep it short" Good: "3 bullets, <50 words each" Bad: "Be careful with large files" Good: "Truncated at 50KB or 2000 lines, whichever comes first" ``` ### Formatting ``` # H1 for tool/agent name only ## H2 for major sections ### H3 sparingly - Bullets for unordered lists inside tags 1. Numbers for ordered procedures | Tables | For | Structured reference data | `inline code` for commands, paths, parameters, values ``` Code blocks with language: ````markdown ```typescript const example = "always specify language"; ``` ```` --- ## Technique Reference ### Chain of Thought **Use when**: Multi-step reasoning, math, analysis, complex decisions **Avoid when**: Simple tasks, reasoning models (o1/o3 do internal CoT) ```xml Before answering: 1. Identify the core question 2. List relevant constraints 3. Consider 2-3 approaches 4. Select best with rationale Then provide your answer. ``` **Token-efficient variant** (Chain of Draft): ``` Think step-by-step, keeping only 5-word notes per step. Output final answer after ####. ``` ### Few-Shot Examples **Use when**: Enforcing specific output format, classification, smaller models **Avoid when**: Advanced models (Claude 3.5+, GPT-4+) on clear tasks — adds noise When using: 3-5 diverse examples covering edge cases. ```xml Input: X Output: Y Input: X' Output: Y' ``` ### Long Context Handling **"Lost in the Middle"**: Beginning and end retain; middle degrades 20%+. 1. Documents at TOP, instructions AFTER 2. Quote grounding: "Quote relevant passages in ``, then analyze" 3. Critical instructions at START and END 4. Chunk >100K tokens → process parallel → synthesize ```xml {{LONG_CONTENT}} 1. Find and quote passages relevant to {{QUERY}} in 2. Analyze based on quoted evidence in ``` ### Verification Patterns **Self-correction without external feedback does not work.** Effective: ``` 1. Generate solution 2. Execute verification (tests, lint, typecheck) 3. On failure: analyze error → fix → re-verify 4. Iterate until pass ``` Ineffective: ``` 1. Generate solution 2. "Critique your solution" ← detection is the bottleneck 3. "Improve based on critique" ← feels productive, doesn't help ``` ### Prompt Chaining **Use when**: Single prompt drops steps, distinct phases, verification needed between. ``` Prompt 1: Analyze → Prompt 2: → Plan → Prompt 3: → Execute → Prompt 4: → Verify → final ``` --- ## Anti-Patterns (Measured Degradation) | Pattern | Problem | | ----------------------------------------- | --------------------------------------- | | "Would you be so kind..." | +perplexity, -4% accuracy | | "I'll tip $2000" | No improvement, sometimes worse | | Explicit CoT on reasoning models (o1/o3) | -36%, conflicts with internal reasoning | | Few-shot on advanced models + clear tasks | Introduces noise/bias | | "Always end with Progress/Questions" | Degrades task performance | | "Be efficient with tokens" | Premature task abandonment | | "Don't do X" without positive alternative | "Always do Y" processes better | | Verbose explanations of obvious concepts | Context bloat; model already knows | | Self-critique without external feedback | Detection is bottleneck, not correction | | Critical instructions only in middle | 20%+ degradation vs start/end | --- ## Complete Examples ### Tool Doc Example ```markdown # Grep Fast regex search built on ripgrep. - Full regex: `log.*Error`, `function\\s+\\w+` - Filter: `glob` (e.g., `*.js`) or `type` (e.g., `js`, `py`) - Cross-line: `multiline: true` for patterns like `struct \\{[\\s\\S]*?field` Depends on `output_mode`: - `content`: Lines with paths and line numbers (default limit: 100) - `files_with_matches`: Paths only - `count`: Match counts per file Truncated results reference `artifact://` for full output. ALWAYS use Grep for search—NEVER invoke `grep` or `rg` via Bash. grep {"pattern": "function\\s+\\w+", "glob": "*.ts"} grep {"pattern": "struct \\{[\\s\\S]*?field", "multiline": true} - Open-ended searches requiring multiple rounds—use Task tool - Raw bash grep/rg invocation ``` ### Agent Example ```markdown --- name: explore description: Fast read-only codebase scout returning compressed context for handoff tools: read, grep, find, bash model: pi/smol, haiku-4.5, haiku-4-5, gemini-flash-latest, gemini-3-flash, zai-glm-4.7, glm-4.7-flash, glm-4.5-flash, gpt-5.1-codex-mini, haiku, flash, mini output: properties: query: type: string files: elements: properties: path: { type: string } line_start: { type: number } line_end: { type: number } description: { type: string } architecture: type: string --- File search specialist. Investigate codebase, return structured findings for handoff. READ-ONLY. You are STRICTLY PROHIBITED from: - Creating, editing, deleting files - Using redirect operators (>, >>) - Running state-changing commands (git add, npm install) - Rapid file discovery via find patterns - Regex search with grep - Tracing imports and dependencies - Spawn parallel tool calls wherever possible - Return absolute paths - Communicate findings directly—do NOT create files 1. grep/find to locate relevant code 2. Read key sections (not entire files) 3. Identify types, interfaces, key functions 4. Note dependencies between files 5. Call `yield` with findings Read-only. Call `yield` when done. This matters. ``` --- ## Deployment Checklist - [ ] **Tag hierarchy**: Enforcement level matches content? - [ ] **Critical at edges**: Most important rules at START and END? - [ ] **Named examples**: All `` tags have `name` attribute? - [ ] **Positive framing**: "Do Y" not just "Don't X"? - [ ] **Direct tone**: No hedging, no filler, urgency where appropriate? - [ ] **Specificity**: Exact formats, limits, constraints—not vague? - [ ] **Token efficiency**: Each sentence justifies its cost? - [ ] **Verification**: External feedback loop if correctness matters? - [ ] **RFC 2119 normative language**: All prescriptive sentences use MUST/MUST NOT/SHOULD/MAY in caps? - [ ] **Persistence**: "Keep going until complete" for complex tasks? **High-impact interventions: persistence, tool verification, planning, context positioning, urgency.** --- ## Quick Reference ### Tag Names ``` Enforcement: Structure: Capability: Examples: Data: Special: ``` ### Example Name Patterns ``` Correctness: name="good", name="bad" Complexity: name="single", name="multi-part", name="basic", name="advanced" Operations: name="create", name="update", name="delete", name="rename" Platforms: name="linux", name="windows-cmd", name="macos" Domains: name="rate-limiting", name="auth", name="validation" ``` ### Task → Technique | Task | Primary | Secondary | | ------------------- | ------------------------------- | -------------------- | | Simple extraction | Clear constraints | Prefilling | | Classification | 3-5 examples | XML structure | | Complex analysis | Structured reasoning | Role + urgency | | Code generation | SEARCH/REPLACE | Verification loop | | Long document | Docs at top, quote-then-analyze | XML structure | | Multi-step workflow | Prompt chaining | Planning instruction | | Domain expertise | Specific role + credentials | Examples |