Security work breaks differently from other engineering work. A wrong refactor produces a failing test; a wrong security call produces a quiet breach six months later. That asymmetry is why I run a deliberately narrow stack of Claude Code skills on my security boxes — fewer, sharper tools that surface evidence rather than autopilot fixes.
This guide walks through the eleven skills I actually keep installed, the three end-to-end workflows I run weekly (PR security review, incident triage, dependency CVE triage), the anti-patterns I refuse to touch, and the signals that tell you it's time to write your own skill instead of installing one.
Security is one of the few areas where Claude's natural caution actually helps. The right skills lean into that. In most domains, Claude's tendency to hedge, ask for confirmation, and surface its uncertainty is friction. In security work, that same posture is exactly what you want: a reviewer that says "this looks like a SQL injection sink, but the input may be sanitised three frames up the stack — let me trace it" rather than a tool that confidently flags a false positive and moves on.
The mental model I use: a Claude Code skill is a Markdown file (SKILL.md) with YAML frontmatter that gets loaded into Claude's context when its trigger conditions match. It's not a plugin running compiled code; it's a prompt fragment plus optional tool allowlist plus optional scripts. That structure has two consequences for security work. First, every skill is auditable — you read the SKILL.md the same way you read a runbook, and you know exactly what behaviour you've installed. Second, skills compose — running a dependency-scanner skill alongside a threat-modeller skill doesn't require any integration work; they share Claude's context and reason over each other's output.
The flip side: a poorly-written security skill is worse than no skill at all. A skill that auto-applies fixes to requirements.txt the moment it sees a CVE is going to ship broken software the first time a transitive pin matters. A skill that tells Claude to "suppress noisy warnings from the SAST tool" will suppress legitimate findings on the day it counts. The picks below are the ones I keep installed because they surface evidence and let me decide; the anti-stack section calls out the patterns I rip out on sight.
One more framing note: this stack is opinionated toward review and triage, not toward replacing your security tooling. Semgrep, Trivy, gitleaks, Snyk, and the rest of the static-analysis ecosystem still do the heavy lifting. The skills here are the glue that turns raw tool output into prioritised, contextualised work — and the second-opinion layer when a reviewer is on call alone at 2am.
Before the picks, the five criteria I apply when I'm deciding whether to install a security-flavoured skill. None of these are about cleverness; they're about whether the skill is safe to leave running on a real codebase.
iam-audit that fires on every commit involving the word "role" will drown you in noise; a skill that says "only activate when the user explicitly asks for an IAM review or when changes touch IAM policy JSON" is one I trust.allowed-tools field lists only what the skill needs — usually Read, Grep, Bash(git diff:*), never an unbounded Bash. A security skill with full shell access is a footgun.Every pick below passes all five. The anti-stack section later in the guide enumerates the failure modes you'll see when you skip these checks.
I'll cover each pick with a one-line rationale, an install command, and a use snippet showing how I actually invoke it. Slugs reference real catalog entries you can install today. The install command is the same shape for all of them — pull the SKILL.md down, drop it in ~/.claude/skills/<slug>/, restart Claude Code.
Reads package-lock.json, poetry.lock, go.sum, or Cargo.lock, cross-references against the local OSV database, and outputs a prioritised list grouped by exploitability rather than CVSS score. Crucially: it tells you why a CVE matters for your specific call graph, not just that it exists.
claude install dependency-cve-triage
# usage
claude "triage the CVEs in our lockfile and flag anything reachable from the public API"Wraps gitleaks with context. Runs the scan, then for each hit Claude reads the surrounding code to determine whether the match is a real secret, a test fixture, an example in documentation, or a false positive. Drops the noise rate by 60-80% on most codebases.
claude install secret-detection-review
claude "scan the repo for secrets and tell me which ones are real"The PR-review workhorse. Focused on the OWASP Top 10 plus a configurable list of project-specific patterns (e.g. "never call eval on user input", "all SQL goes through the ORM"). Outputs inline-comment-shaped findings.
Walks a design doc or architecture diagram (Mermaid, PlantUML, or plain prose) and produces a STRIDE-categorised threat list. Doesn't pretend to replace a real threat-modelling session; does drastically speed up the prep work.
claude install threat-modeller
claude "threat-model this design doc, focus on the trust boundaries between services"The skill I'm most grateful for at 3am. Walks you through SEV-classification, blast-radius assessment, comms cadence, and evidence-preservation in a fixed order. Has explicit anti-triggers so it doesn't fire on the word "incident" in unrelated contexts.
Ingests a Slack channel export, a log bundle, and a list of commits, then produces a unified timeline with confidence annotations. Marks every event as directly-observed, inferred, or reported.
Generates a blameless postmortem skeleton from the timeline output. Five required sections, no embellishment, leaves the contributing-factors analysis to you.
Reads Terraform or CloudFormation, surfaces wildcard permissions, role-assumption chains, and privilege-escalation paths. Specifically refuses to operate on live IAM via the AWS API — read-only against your IaC source.
claude install iam-audit-aws
claude "audit IAM in this terraform module, flag any policy that grants iam:PassRole broadly"Reviews Dockerfiles against a hardening baseline: non-root user, pinned base image digests, no ADD from URLs, multi-stage builds where appropriate. Doesn't auto-fix; explains each finding.
Takes a log bundle (JSON, syslog, or CloudWatch export) and clusters anomalies. Useful as the first pass during an active incident — drops you from thousands of lines to a dozen clusters worth investigating.
Reviews a draft policy document (vendor security questionnaire, SOC 2 control narrative, internal AUP) against a checklist of what reviewers actually look for. Surfaces vague language and missing scope.
This is the workflow I run most often — every PR that touches authentication, authorization, input handling, or anything in infrastructure/ gets a pass through it before I approve. The whole loop takes 3-8 minutes per PR depending on size.
Step 1: initial pass. With code-review-security installed, I run:
git fetch origin pr/1234
git checkout pr/1234
claude "do a security review of the diff against main, output inline comments grouped by severity"The output shape is critical here. I want findings with file:line references, the exact pattern matched, and a one-sentence rationale. I do not want a narrative summary at the top — those tend to be confidently wrong and bias my read of the actual findings.
Step 2: cross-check with secrets. The diff might introduce a new .env.example or rotate a test fixture. I run secret-detection-review against the diff only:
claude "scan the diff for secrets, ignore anything already on main"This catches the case where someone copy-pastes a real key into an example file. About 1 in 40 PRs in my experience has at least one finding here; about 1 in 200 has a real secret.
Step 3: triage and respond. I take the findings into the PR review UI manually. Claude doesn't post to GitHub directly in this workflow — I've found that one human-eye filter step prevents the noisy-reviewer reputation that gets your bot ignored. For each finding I either: leave an inline comment with the suggested fix, dismiss it as a false positive with a one-line reason, or escalate it as a blocking review.
Step 4: re-review after fix. Once the author pushes a fix, I re-run the same skill against the new diff:
git pull
claude "re-review the security findings, did the latest commits address them?"The skill is configured to read the previous findings (it persists them to a local cache keyed by PR number) and explicitly check whether each one is resolved, partially-resolved, or unchanged. This step catches the failure mode where an author fixes the immediate finding but introduces an equivalent vulnerability one frame up the call stack.
Step 5: approve or request changes. If all findings are resolved or dismissed-with-reason, I approve. If anything is still red, the PR doesn't merge. The skill never auto-approves — that's a hard rule, and any skill that offers to auto-approve PRs based on its own review gets uninstalled immediately.
The incident workflow is more time-sensitive than the PR workflow, so the skill setup is deliberately minimal — three skills, one runbook, no surprises. The whole thing is designed to be runnable by whoever's on call, not just the security team.
Phase 1: triage (first 15 minutes). Page fires, on-call acks, opens Claude Code in a fresh terminal. The first command is always:
claude "run the incident triage checklist for a possible <type> incident"Where <type> is one of: data-exposure, account-compromise, availability, integrity, supply-chain. The incident-triage-checklist skill walks through SEV classification (what's the user impact?), blast radius (how many users, what data classes?), and immediate containment options (revoke tokens? rotate keys? isolate hosts?). Output is a numbered checklist with a recommended SEV. I've never had it overstate severity; occasionally understates and gets corrected on the next iteration.
Phase 2: evidence gathering (next 30-60 minutes). While I'm executing the checklist (revoking, rotating, isolating), a parallel Claude session pulls relevant logs. log-triage ingests the bundle:
aws logs tail /aws/lambda/auth --since 6h --format short > /tmp/auth-logs.txt
claude "triage these logs, cluster anomalies, flag anything that looks like the incident pattern"The skill outputs a small number of clusters (usually 5-15) with sample lines from each. I pin the ones that look related and discard the rest.
Phase 3: timeline reconstruction (after containment). Once the bleeding has stopped, I export the relevant Slack channel, gather the log bundles, list the commits in the relevant window, and feed everything to timeline-reconstruction:
claude "reconstruct the timeline from the slack export, the auth-logs.txt bundle, and these commits: <sha list>. mark each event with confidence."The output is a Markdown timeline with one row per event, columns for timestamp, source, description, and confidence. The confidence annotations are what make this useful — "directly-observed in CloudTrail" is treated differently from "inferred from absence of expected log line."
Phase 4: postmortem skeleton. Within 24-72 hours of resolution, I feed the timeline to postmortem-drafter:
claude "draft a blameless postmortem skeleton from this timeline, fill in the sections you can support from evidence, leave the rest as TODO"The skill produces a five-section skeleton (summary, timeline, impact, contributing factors, action items) with the timeline and impact pre-filled. Contributing factors and action items are left as TODOs because those are judgment calls that belong to the responders, not the tool. I take it into a doc, fill in the TODOs with the team, and circulate for review.
The CVE workflow is the least time-sensitive of the three but the most likely to produce noise without good tooling. The default dependency-scanner output is a flat list of CVEs ranked by CVSS, which is useless: a CVSS 9.8 in a transitive dev dependency that's only loaded during tests is less urgent than a CVSS 6.5 in a runtime auth library.
Step 1: scan and prioritise. I run dependency-cve-triage against the lockfile:
claude "triage CVEs in package-lock.json, prioritise by reachability from the production entry points in src/server/"The skill reads the lockfile, queries the OSV database for each pinned version, then — and this is the bit that matters — reads the source to determine whether the vulnerable code path is actually reachable from the application. A CVE in a code path your app never calls is downgraded to informational; a CVE in a hot path is escalated regardless of CVSS.
Step 2: impact assessment for each high-priority finding. For each finding the skill rates as reachable + high-severity, I ask for a fuller assessment:
claude "deep-dive on CVE-2026-12345: which of our code paths reach the vulnerable function, what's the data flow, what's the worst case if exploited in our context?"This second pass is where you find out whether a nominally-critical CVE is actually exploitable in your application or whether your input validation upstream makes it a non-issue. The skill cites specific files and line numbers in your code — I won't ticket anything without that grounding.
Step 3: fix-or-defer decision. For each finding I make one of three calls. Fix now: bump the dependency, run the test suite, ship it. Defer with rationale: ticket it with the reachability analysis and a re-evaluation date. Won't fix: document in a security log with the reason (not reachable, mitigated by other controls, accepted risk).
Step 4: ticket creation. For findings I'm tracking but not fixing immediately, I generate a ticket draft:
claude "draft a Jira ticket for CVE-2026-12345 with the reachability analysis, affected packages, suggested fix, and a 30-day re-evaluation date"I review the ticket before filing — Claude tends to be slightly verbose in the description field, and I prefer terse tickets that engineers will actually read. The skill outputs a draft I can copy-paste into Jira; it does not file the ticket directly, because automated ticket creation from a security tool is a recipe for a 200-issue backlog nobody triages.
This whole workflow runs weekly on a quiet Friday afternoon. The triage step takes 5-10 minutes on a typical codebase; the per-finding deep-dives add 2-5 minutes each, and there are usually 3-8 of those per scan.
The picks above are deliberate. The skills below are equally deliberate omissions — patterns I've seen go wrong on production codebases. If you've installed something matching one of these descriptions, consider whether it's actually earning its keep.
Auto-fix dependency bumpers. Skills that watch your lockfile, detect a CVE, and automatically open a PR with the patched version. The failure mode: a transitive bump breaks a peer-dependency constraint or introduces a semver-minor breaking change, and the auto-PR sails through CI on a green test suite because the failure surfaces only in production. If you want automated dependency PRs, use Dependabot or Renovate — purpose-built tools with mature configuration for excluding paths, batching, and version constraints. A Claude skill that re-implements 10% of Renovate badly is worse than no skill at all.
Warning suppressors. Anything that tells Claude to "filter out low-confidence findings" or "suppress noisy linter warnings." The whole point of running a security tool is to surface the warnings. If your SAST tool is noisy, tune the SAST tool's config; don't add a layer that hides findings before you see them. The day a real finding gets suppressed because it pattern-matched a "noisy" template, you'll wish you'd left the noise in.
SAST replacers. Skills that claim to "replace your static analysis tool" with pure-LLM review. SAST tools work because they parse code into ASTs and apply deterministic taint-tracking; LLM review is good at the squishy patterns SAST misses, but it's not a substitute. Run both. The right Claude skill is one that consumes SAST output and adds context — not one that pretends to do the SAST itself.
Auto-merging review bots. Any skill offering to auto-approve PRs based on its own review. The reasoning is independent of skill quality: an auto-approving security reviewer is a single point of failure with no second-pair-of-eyes. Even if the skill is perfect, the workflow is wrong.
Live-environment IAM editors. Skills that connect to AWS / GCP / Azure APIs and edit IAM policies directly. Audit against your IaC source instead — Terraform, CloudFormation, Pulumi. The blast radius of a Claude session mis-editing a production IAM policy is too large to mitigate with any amount of confirmation prompting.
"Vulnerability scanner" megamods. Skills that bundle 15 unrelated checks under one name ("complete security audit") and fire on every commit. They produce 200-finding reports that nobody reads. Install focused skills with explicit triggers; let composition do the integration work.
Anything without anti-triggers. If a skill's SKILL.md doesn't have a "When NOT to use" section, it's going to fire in contexts you didn't intend. The good security skills are conservative by default and let you invoke them explicitly when needed.
The 11 skills above cover most of the general-purpose security work I do. Roughly 30% of my actual day-to-day is covered by skills I wrote myself, for patterns specific to my stack. Three signals tell me it's time to write rather than install.
Signal 1: the catalog has nothing for your stack. If you work on Erlang, Rust embedded firmware, or a niche industrial protocol, the public catalogs are thin. Write a skill that encodes the patterns you and your team already check for. A SKILL.md is essentially a runbook with YAML on top — if you have a wiki page describing "how we review changes to the ACL module," that's a SKILL.md draft already.
Signal 2: a repeated correction. If you find yourself giving Claude the same correction three times in a week — "no, our convention is to use the safe_query helper, not raw SQL" — that's a project-specific skill waiting to be written. Codify the convention in a SKILL.md, drop it in .claude/skills/ in the repo, and every future Claude session in that repo loads it automatically.
Signal 3: an internal tool with an API. If your team has an internal vulnerability database, a custom SAST tool, or a homegrown audit log, write a skill that wraps it. The skill becomes the documentation: anyone joining the team can read the SKILL.md to understand how to use the tool, and Claude can invoke it competently from day one.
The structure I follow for a security-flavoured skill, in order:
review-iam-changes-aws is better than security-helper.aws_iam_policy Terraform resources."Read and Grep are enough.That structure produces skills that compose well with the rest of your stack and that you'll still trust six months from now. The catalog browser at /category/security/ has dozens of examples to crib from — read three or four before you write your first one.
A few practical points that don't fit neatly into the workflows but matter for keeping this stack healthy in production.
Update cadence. Security skills age faster than other categories — new CVE classes appear, IAM service additions create new attack surfaces, container runtimes ship new defaults. I re-review my installed security skills every quarter, uninstall any that haven't been updated upstream in 6+ months, and check for new entries that supersede mine. The catalog's lastmod field on each skill is the signal to watch.
Skill conflicts. Two skills with overlapping triggers can fight for context. If you install both a general code-review skill and code-review-security, the security-specific one should have tighter triggers so it only fires when the diff actually warrants security focus. Read the trigger sections of any two skills you're running side-by-side; if they overlap, decide which one wins and tighten the other.
Logging and audit trail. Claude Code keeps a session transcript per project — for security work, treat those transcripts as audit artefacts. I keep them for 90 days minimum, longer if they contain incident-response activity. The skill outputs become part of the evidence trail; treat them with the same hygiene as any other security log.
Air-gapped or restricted environments. If you're working in an environment where Claude Code can't reach the internet (regulated industries, classified work), the catalog skills won't help — you can't install them. Build a vetted internal subset: pick 5-10 skills from the public catalog, review every line of every SKILL.md, ship them through your normal software supply chain, and load them from ~/.claude/skills/ on the locked-down host. Treat each one as a third-party library subject to the same review process as anything else from outside your perimeter.
Onboarding. When a new engineer joins the team, the SKILL.md files in the project's .claude/skills/ directory are the fastest way to teach them the security conventions of the codebase. The skills serve a dual purpose: they make Claude behave correctly, and they document the conventions in machine-readable form. New hires get up to speed faster reading SKILL.mds than they do reading wikis, in my experience, because the SKILL.mds describe what to do, not what to know.
What I check monthly. Run each installed skill against a known-good test repo and verify the output is what I expect. Catches the case where a skill's upstream maintainer pushed an update that subtly changed behaviour. Five minutes per skill, once a month, has saved me from at least three regressions in the past year.
Found a bug or want a topic covered? Email [email protected] or open an issue via GitHub.
SKILL.md files, not affiliated with, endorsed by, or sponsored by Anthropic.