Identifies error-prone APIs, dangerous configurations, and footgun designs that enable security mistakes.
Run Semgrep static analysis scan on a codebase using parallel subagents. Supports two scan modes — "run all" (full ruleset coverage) and "important only" (high-confidence security…
Detects fail-open insecure defaults (hardcoded secrets, weak auth, permissive security) that allow apps to run insecurely in production.
Performs security-focused differential review of code changes (PRs, commits, diffs). Adapts analysis depth to codebase size, uses git history for context, calculates blast radius,…
Scans a codebase for security vulnerabilities using CodeQL's interprocedural data flow and taint tracking analysis.
Use when [specific triggering conditions - what symptom or situation activates this skill]
Use when Code implementation and refactoring, architecturing or designing systems, process and workflow improvements, error handling and validation.
Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc.
Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for…
Create isolated git worktrees with smart directory selection and safety verification
Help the user systematically identify and categorize failure modes in an LLM pipeline by reading traces.
Use multiple Claude agents to investigate and fix independent problems concurrently
Help address review/issue comments on the open GitHub PR for the current branch using gh CLI; verify gh auth first and prompt the user to authenticate if not logged in.
Coverage analysis measures code exercised during fuzzing. Use when assessing harness effectiveness or identifying fuzzing blockers.
Enables ultra-granular, line-by-line code analysis to build deep architectural context before vulnerability or bug finding.
Find similar vulnerabilities and bugs across codebases using pattern-based analysis. Use when hunting bug variants, building CodeQL/Semgrep queries, analyzing security…
Apply writing rules to any documentation that humans will read. Makes your writing clearer, stronger, and more professional.
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any…
Create MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools.
Repository-grounded threat modeling that enumerates trust boundaries, assets, attacker capabilities, abuse paths, and mitigations, and writes a concise Markdown threat model.
Create diverse synthetic test inputs for LLM pipeline evaluation using dimension-based tuple generation.
Design LLM-as-Judge evaluators for subjective criteria that code-based checks cannot handle. Use when a failure mode requires interpretation (tone, faithfulness, relevance,…
Creates specs before coding. Use when starting a new project, feature, or significant change and no specification exists yet.
Receive and act on code review feedback with technical rigor, not performative agreement or blind implementation
Find bugs, security vulnerabilities, and code quality issues in local branch changes. Use when asked to review changes, find bugs, security review, or audit code on the current…
Analyze and resolve Sentry comments on GitHub Pull Requests. Use this when asked to review or fix issues identified by Sentry in PR comments.
Analyze git repositories to build a security ownership topology (people-to-file), compute bus factor and sensitive-code ownership, and export CSV/JSON for graph databases and…
Use for self-correcting implementation. Implements the reflexion loop: implement, validate, self-critique, retry (max 3 iterations).
Provides guidance for property-based testing across multiple languages and smart contracts. Use when writing tests, reviewing code with serialization/validation/parsing patterns,…
Postgres performance optimization and best practices from Supabase. Use this skill when writing, reviewing, or optimizing Postgres queries, schema designs, or database…
Based on the Recursive Language Models (RLM) research by Zhang, Kraska, and Khattab (2025), this skill provides strategies for handling tasks that exceed comfortable context…
Cost-optimize AI agent operations by routing tasks to appropriate models based on complexity. Use this skill when: (1) deciding which model to use for a task, (2) spawning…