Claude Code Skills·Claude Skills·The open SKILL.md registry for Claude
HomeLearn › Top Claude Code Skills for Security Engineers

Top Claude Code Skills for Security Engineers

Published 1 June 2026 · 14 min read · By a long-time Claude Code practitioner

Security work breaks differently from other engineering work. A wrong refactor produces a failing test; a wrong security call produces a quiet breach six months later. That asymmetry is why I run a deliberately narrow stack of Claude Code skills on my security boxes — fewer, sharper tools that surface evidence rather than autopilot fixes.

This guide walks through the eleven skills I actually keep installed, the three end-to-end workflows I run weekly (PR security review, incident triage, dependency CVE triage), the anti-patterns I refuse to touch, and the signals that tell you it's time to write your own skill instead of installing one.

In this guide

Why security work fits Claude

Security is one of the few areas where Claude's natural caution actually helps. The right skills lean into that. In most domains, Claude's tendency to hedge, ask for confirmation, and surface its uncertainty is friction. In security work, that same posture is exactly what you want: a reviewer that says "this looks like a SQL injection sink, but the input may be sanitised three frames up the stack — let me trace it" rather than a tool that confidently flags a false positive and moves on.

The mental model I use: a Claude Code skill is a Markdown file (SKILL.md) with YAML frontmatter that gets loaded into Claude's context when its trigger conditions match. It's not a plugin running compiled code; it's a prompt fragment plus optional tool allowlist plus optional scripts. That structure has two consequences for security work. First, every skill is auditable — you read the SKILL.md the same way you read a runbook, and you know exactly what behaviour you've installed. Second, skills compose — running a dependency-scanner skill alongside a threat-modeller skill doesn't require any integration work; they share Claude's context and reason over each other's output.

The flip side: a poorly-written security skill is worse than no skill at all. A skill that auto-applies fixes to requirements.txt the moment it sees a CVE is going to ship broken software the first time a transitive pin matters. A skill that tells Claude to "suppress noisy warnings from the SAST tool" will suppress legitimate findings on the day it counts. The picks below are the ones I keep installed because they surface evidence and let me decide; the anti-stack section calls out the patterns I rip out on sight.

One more framing note: this stack is opinionated toward review and triage, not toward replacing your security tooling. Semgrep, Trivy, gitleaks, Snyk, and the rest of the static-analysis ecosystem still do the heavy lifting. The skills here are the glue that turns raw tool output into prioritised, contextualised work — and the second-opinion layer when a reviewer is on call alone at 2am.

What separates a good security skill

Before the picks, the five criteria I apply when I'm deciding whether to install a security-flavoured skill. None of these are about cleverness; they're about whether the skill is safe to leave running on a real codebase.

  1. Anti-trigger discipline. The SKILL.md has an explicit "When NOT to use" or "Out of scope" section. A skill called iam-audit that fires on every commit involving the word "role" will drown you in noise; a skill that says "only activate when the user explicitly asks for an IAM review or when changes touch IAM policy JSON" is one I trust.
  2. Bounded tool access. The frontmatter allowed-tools field lists only what the skill needs — usually Read, Grep, Bash(git diff:*), never an unbounded Bash. A security skill with full shell access is a footgun.
  3. Evidence over verdicts. The skill outputs findings with file paths, line numbers, and the exact pattern matched — not just "this code is vulnerable." If I can't trace a finding back to a specific line, I can't triage it.
  4. No silent suppression. If the skill ever tells Claude to "skip" a class of finding, it's gone. Suppression decisions belong in your SAST config, not buried in a Markdown prompt.
  5. Reproducibility. Running the skill twice on the same code produces the same findings. Skills that lean heavily on Claude's free generation without grounding produce subtly different output on each run, which destroys triage workflows.

Every pick below passes all five. The anti-stack section later in the guide enumerates the failure modes you'll see when you skip these checks.

The 11 skills I keep installed

I'll cover each pick with a one-line rationale, an install command, and a use snippet showing how I actually invoke it. Slugs reference real catalog entries you can install today. The install command is the same shape for all of them — pull the SKILL.md down, drop it in ~/.claude/skills/<slug>/, restart Claude Code.

1. dependency-cve-triage

Reads package-lock.json, poetry.lock, go.sum, or Cargo.lock, cross-references against the local OSV database, and outputs a prioritised list grouped by exploitability rather than CVSS score. Crucially: it tells you why a CVE matters for your specific call graph, not just that it exists.

claude install dependency-cve-triage
# usage
claude "triage the CVEs in our lockfile and flag anything reachable from the public API"

2. secret-detection-review

Wraps gitleaks with context. Runs the scan, then for each hit Claude reads the surrounding code to determine whether the match is a real secret, a test fixture, an example in documentation, or a false positive. Drops the noise rate by 60-80% on most codebases.

claude install secret-detection-review
claude "scan the repo for secrets and tell me which ones are real"

3. code-review-security

The PR-review workhorse. Focused on the OWASP Top 10 plus a configurable list of project-specific patterns (e.g. "never call eval on user input", "all SQL goes through the ORM"). Outputs inline-comment-shaped findings.

4. threat-modeller

Walks a design doc or architecture diagram (Mermaid, PlantUML, or plain prose) and produces a STRIDE-categorised threat list. Doesn't pretend to replace a real threat-modelling session; does drastically speed up the prep work.

claude install threat-modeller
claude "threat-model this design doc, focus on the trust boundaries between services"

5. incident-triage-checklist

The skill I'm most grateful for at 3am. Walks you through SEV-classification, blast-radius assessment, comms cadence, and evidence-preservation in a fixed order. Has explicit anti-triggers so it doesn't fire on the word "incident" in unrelated contexts.

6. timeline-reconstruction

Ingests a Slack channel export, a log bundle, and a list of commits, then produces a unified timeline with confidence annotations. Marks every event as directly-observed, inferred, or reported.

7. postmortem-drafter

Generates a blameless postmortem skeleton from the timeline output. Five required sections, no embellishment, leaves the contributing-factors analysis to you.

8. iam-audit-aws

Reads Terraform or CloudFormation, surfaces wildcard permissions, role-assumption chains, and privilege-escalation paths. Specifically refuses to operate on live IAM via the AWS API — read-only against your IaC source.

claude install iam-audit-aws
claude "audit IAM in this terraform module, flag any policy that grants iam:PassRole broadly"

9. container-hardening-review

Reviews Dockerfiles against a hardening baseline: non-root user, pinned base image digests, no ADD from URLs, multi-stage builds where appropriate. Doesn't auto-fix; explains each finding.

10. log-triage

Takes a log bundle (JSON, syslog, or CloudWatch export) and clusters anomalies. Useful as the first pass during an active incident — drops you from thousands of lines to a dozen clusters worth investigating.

11. security-policy-review

Reviews a draft policy document (vendor security questionnaire, SOC 2 control narrative, internal AUP) against a checklist of what reviewers actually look for. Surfaces vague language and missing scope.

Workflow: PR security review → fix → re-review

This is the workflow I run most often — every PR that touches authentication, authorization, input handling, or anything in infrastructure/ gets a pass through it before I approve. The whole loop takes 3-8 minutes per PR depending on size.

Step 1: initial pass. With code-review-security installed, I run:

git fetch origin pr/1234
git checkout pr/1234
claude "do a security review of the diff against main, output inline comments grouped by severity"

The output shape is critical here. I want findings with file:line references, the exact pattern matched, and a one-sentence rationale. I do not want a narrative summary at the top — those tend to be confidently wrong and bias my read of the actual findings.

Step 2: cross-check with secrets. The diff might introduce a new .env.example or rotate a test fixture. I run secret-detection-review against the diff only:

claude "scan the diff for secrets, ignore anything already on main"

This catches the case where someone copy-pastes a real key into an example file. About 1 in 40 PRs in my experience has at least one finding here; about 1 in 200 has a real secret.

Step 3: triage and respond. I take the findings into the PR review UI manually. Claude doesn't post to GitHub directly in this workflow — I've found that one human-eye filter step prevents the noisy-reviewer reputation that gets your bot ignored. For each finding I either: leave an inline comment with the suggested fix, dismiss it as a false positive with a one-line reason, or escalate it as a blocking review.

Step 4: re-review after fix. Once the author pushes a fix, I re-run the same skill against the new diff:

git pull
claude "re-review the security findings, did the latest commits address them?"

The skill is configured to read the previous findings (it persists them to a local cache keyed by PR number) and explicitly check whether each one is resolved, partially-resolved, or unchanged. This step catches the failure mode where an author fixes the immediate finding but introduces an equivalent vulnerability one frame up the call stack.

Step 5: approve or request changes. If all findings are resolved or dismissed-with-reason, I approve. If anything is still red, the PR doesn't merge. The skill never auto-approves — that's a hard rule, and any skill that offers to auto-approve PRs based on its own review gets uninstalled immediately.

Workflow: incident triage → timeline → postmortem

The incident workflow is more time-sensitive than the PR workflow, so the skill setup is deliberately minimal — three skills, one runbook, no surprises. The whole thing is designed to be runnable by whoever's on call, not just the security team.

Phase 1: triage (first 15 minutes). Page fires, on-call acks, opens Claude Code in a fresh terminal. The first command is always:

claude "run the incident triage checklist for a possible <type> incident"

Where <type> is one of: data-exposure, account-compromise, availability, integrity, supply-chain. The incident-triage-checklist skill walks through SEV classification (what's the user impact?), blast radius (how many users, what data classes?), and immediate containment options (revoke tokens? rotate keys? isolate hosts?). Output is a numbered checklist with a recommended SEV. I've never had it overstate severity; occasionally understates and gets corrected on the next iteration.

Phase 2: evidence gathering (next 30-60 minutes). While I'm executing the checklist (revoking, rotating, isolating), a parallel Claude session pulls relevant logs. log-triage ingests the bundle:

aws logs tail /aws/lambda/auth --since 6h --format short > /tmp/auth-logs.txt
claude "triage these logs, cluster anomalies, flag anything that looks like the incident pattern"

The skill outputs a small number of clusters (usually 5-15) with sample lines from each. I pin the ones that look related and discard the rest.

Phase 3: timeline reconstruction (after containment). Once the bleeding has stopped, I export the relevant Slack channel, gather the log bundles, list the commits in the relevant window, and feed everything to timeline-reconstruction:

claude "reconstruct the timeline from the slack export, the auth-logs.txt bundle, and these commits: <sha list>. mark each event with confidence."

The output is a Markdown timeline with one row per event, columns for timestamp, source, description, and confidence. The confidence annotations are what make this useful — "directly-observed in CloudTrail" is treated differently from "inferred from absence of expected log line."

Phase 4: postmortem skeleton. Within 24-72 hours of resolution, I feed the timeline to postmortem-drafter:

claude "draft a blameless postmortem skeleton from this timeline, fill in the sections you can support from evidence, leave the rest as TODO"

The skill produces a five-section skeleton (summary, timeline, impact, contributing factors, action items) with the timeline and impact pre-filled. Contributing factors and action items are left as TODOs because those are judgment calls that belong to the responders, not the tool. I take it into a doc, fill in the TODOs with the team, and circulate for review.

Workflow: dependency CVE → impact assessment → ticket

The CVE workflow is the least time-sensitive of the three but the most likely to produce noise without good tooling. The default dependency-scanner output is a flat list of CVEs ranked by CVSS, which is useless: a CVSS 9.8 in a transitive dev dependency that's only loaded during tests is less urgent than a CVSS 6.5 in a runtime auth library.

Step 1: scan and prioritise. I run dependency-cve-triage against the lockfile:

claude "triage CVEs in package-lock.json, prioritise by reachability from the production entry points in src/server/"

The skill reads the lockfile, queries the OSV database for each pinned version, then — and this is the bit that matters — reads the source to determine whether the vulnerable code path is actually reachable from the application. A CVE in a code path your app never calls is downgraded to informational; a CVE in a hot path is escalated regardless of CVSS.

Step 2: impact assessment for each high-priority finding. For each finding the skill rates as reachable + high-severity, I ask for a fuller assessment:

claude "deep-dive on CVE-2026-12345: which of our code paths reach the vulnerable function, what's the data flow, what's the worst case if exploited in our context?"

This second pass is where you find out whether a nominally-critical CVE is actually exploitable in your application or whether your input validation upstream makes it a non-issue. The skill cites specific files and line numbers in your code — I won't ticket anything without that grounding.

Step 3: fix-or-defer decision. For each finding I make one of three calls. Fix now: bump the dependency, run the test suite, ship it. Defer with rationale: ticket it with the reachability analysis and a re-evaluation date. Won't fix: document in a security log with the reason (not reachable, mitigated by other controls, accepted risk).

Step 4: ticket creation. For findings I'm tracking but not fixing immediately, I generate a ticket draft:

claude "draft a Jira ticket for CVE-2026-12345 with the reachability analysis, affected packages, suggested fix, and a 30-day re-evaluation date"

I review the ticket before filing — Claude tends to be slightly verbose in the description field, and I prefer terse tickets that engineers will actually read. The skill outputs a draft I can copy-paste into Jira; it does not file the ticket directly, because automated ticket creation from a security tool is a recipe for a 200-issue backlog nobody triages.

This whole workflow runs weekly on a quiet Friday afternoon. The triage step takes 5-10 minutes on a typical codebase; the per-finding deep-dives add 2-5 minutes each, and there are usually 3-8 of those per scan.

The anti-stack: skills to avoid

The picks above are deliberate. The skills below are equally deliberate omissions — patterns I've seen go wrong on production codebases. If you've installed something matching one of these descriptions, consider whether it's actually earning its keep.

Auto-fix dependency bumpers. Skills that watch your lockfile, detect a CVE, and automatically open a PR with the patched version. The failure mode: a transitive bump breaks a peer-dependency constraint or introduces a semver-minor breaking change, and the auto-PR sails through CI on a green test suite because the failure surfaces only in production. If you want automated dependency PRs, use Dependabot or Renovate — purpose-built tools with mature configuration for excluding paths, batching, and version constraints. A Claude skill that re-implements 10% of Renovate badly is worse than no skill at all.

Warning suppressors. Anything that tells Claude to "filter out low-confidence findings" or "suppress noisy linter warnings." The whole point of running a security tool is to surface the warnings. If your SAST tool is noisy, tune the SAST tool's config; don't add a layer that hides findings before you see them. The day a real finding gets suppressed because it pattern-matched a "noisy" template, you'll wish you'd left the noise in.

SAST replacers. Skills that claim to "replace your static analysis tool" with pure-LLM review. SAST tools work because they parse code into ASTs and apply deterministic taint-tracking; LLM review is good at the squishy patterns SAST misses, but it's not a substitute. Run both. The right Claude skill is one that consumes SAST output and adds context — not one that pretends to do the SAST itself.

Auto-merging review bots. Any skill offering to auto-approve PRs based on its own review. The reasoning is independent of skill quality: an auto-approving security reviewer is a single point of failure with no second-pair-of-eyes. Even if the skill is perfect, the workflow is wrong.

Live-environment IAM editors. Skills that connect to AWS / GCP / Azure APIs and edit IAM policies directly. Audit against your IaC source instead — Terraform, CloudFormation, Pulumi. The blast radius of a Claude session mis-editing a production IAM policy is too large to mitigate with any amount of confirmation prompting.

"Vulnerability scanner" megamods. Skills that bundle 15 unrelated checks under one name ("complete security audit") and fire on every commit. They produce 200-finding reports that nobody reads. Install focused skills with explicit triggers; let composition do the integration work.

Anything without anti-triggers. If a skill's SKILL.md doesn't have a "When NOT to use" section, it's going to fire in contexts you didn't intend. The good security skills are conservative by default and let you invoke them explicitly when needed.

When to write your own

The 11 skills above cover most of the general-purpose security work I do. Roughly 30% of my actual day-to-day is covered by skills I wrote myself, for patterns specific to my stack. Three signals tell me it's time to write rather than install.

Signal 1: the catalog has nothing for your stack. If you work on Erlang, Rust embedded firmware, or a niche industrial protocol, the public catalogs are thin. Write a skill that encodes the patterns you and your team already check for. A SKILL.md is essentially a runbook with YAML on top — if you have a wiki page describing "how we review changes to the ACL module," that's a SKILL.md draft already.

Signal 2: a repeated correction. If you find yourself giving Claude the same correction three times in a week — "no, our convention is to use the safe_query helper, not raw SQL" — that's a project-specific skill waiting to be written. Codify the convention in a SKILL.md, drop it in .claude/skills/ in the repo, and every future Claude session in that repo loads it automatically.

Signal 3: an internal tool with an API. If your team has an internal vulnerability database, a custom SAST tool, or a homegrown audit log, write a skill that wraps it. The skill becomes the documentation: anyone joining the team can read the SKILL.md to understand how to use the tool, and Claude can invoke it competently from day one.

The structure I follow for a security-flavoured skill, in order:

  1. Name and description. Be specific. review-iam-changes-aws is better than security-helper.
  2. When to use. Two to four sentences describing the trigger conditions. "User asks for a security review of IAM changes, or the diff contains aws_iam_policy Terraform resources."
  3. When NOT to use. Explicit list. "Don't activate for general code review. Don't activate on changes to test fixtures. Don't activate on documentation-only commits."
  4. Allowed tools. Minimum needed. For most review skills, Read and Grep are enough.
  5. Procedure. Numbered steps. Each step references the artefacts to read and the output format expected.
  6. Output format. Explicit. "Output a Markdown list, one finding per item, with file:line, severity, and one-sentence rationale."
  7. Anti-patterns. What the skill should refuse to do. "Do not suggest auto-fixes. Do not approve the change. Do not file tickets."

That structure produces skills that compose well with the rest of your stack and that you'll still trust six months from now. The catalog browser at /category/security/ has dozens of examples to crib from — read three or four before you write your first one.

Operating notes

A few practical points that don't fit neatly into the workflows but matter for keeping this stack healthy in production.

Update cadence. Security skills age faster than other categories — new CVE classes appear, IAM service additions create new attack surfaces, container runtimes ship new defaults. I re-review my installed security skills every quarter, uninstall any that haven't been updated upstream in 6+ months, and check for new entries that supersede mine. The catalog's lastmod field on each skill is the signal to watch.

Skill conflicts. Two skills with overlapping triggers can fight for context. If you install both a general code-review skill and code-review-security, the security-specific one should have tighter triggers so it only fires when the diff actually warrants security focus. Read the trigger sections of any two skills you're running side-by-side; if they overlap, decide which one wins and tighten the other.

Logging and audit trail. Claude Code keeps a session transcript per project — for security work, treat those transcripts as audit artefacts. I keep them for 90 days minimum, longer if they contain incident-response activity. The skill outputs become part of the evidence trail; treat them with the same hygiene as any other security log.

Air-gapped or restricted environments. If you're working in an environment where Claude Code can't reach the internet (regulated industries, classified work), the catalog skills won't help — you can't install them. Build a vetted internal subset: pick 5-10 skills from the public catalog, review every line of every SKILL.md, ship them through your normal software supply chain, and load them from ~/.claude/skills/ on the locked-down host. Treat each one as a third-party library subject to the same review process as anything else from outside your perimeter.

Onboarding. When a new engineer joins the team, the SKILL.md files in the project's .claude/skills/ directory are the fastest way to teach them the security conventions of the codebase. The skills serve a dual purpose: they make Claude behave correctly, and they document the conventions in machine-readable form. New hires get up to speed faster reading SKILL.mds than they do reading wikis, in my experience, because the SKILL.mds describe what to do, not what to know.

What I check monthly. Run each installed skill against a known-good test repo and verify the output is what I expect. Catches the case where a skill's upstream maintainer pushed an update that subtly changed behaviour. Five minutes per skill, once a month, has saved me from at least three regressions in the past year.

Frequently asked questions

Can I use these skills if I'm a solo developer rather than on a security team?
Yes, and you arguably benefit more. Solo developers don't have a security review partner; this stack gives you a structured second opinion on every PR and a runbook for the incidents you hope never happen. Start with code-review-security, secret-detection-review, and dependency-cve-triage — those three cover most of what a part-time security practice needs.
How do I know if a security skill is actually trustworthy?
Read the SKILL.md before you install it. Look for explicit anti-triggers, a minimal allowed-tools list, and output formats that produce evidence rather than verdicts. Avoid anything that promises to auto-fix or auto-approve. If the SKILL.md is under 50 lines and vague, skip it; if it's specific and conservative, install it.
Does Claude actually understand vulnerabilities, or is it pattern-matching?
It's pattern-matching with enough context to be useful, not deep semantic understanding. That's why the workflows above always pair Claude with a deterministic tool (SAST, OSV database, gitleaks) — Claude adds context to tool output rather than replacing the tool. Don't trust a finding that isn't grounded in a specific file and line.
What happens if a skill fires when I didn't want it to?
Tell Claude to skip it explicitly for the rest of the session, then look at the skill's trigger conditions in its SKILL.md and consider whether they need tightening. If the same misfire happens repeatedly, the skill has bad triggers and should be replaced or uninstalled.
Should I install every security skill I can find?
No. More skills means more context fighting for Claude's attention and more overlapping triggers. The 11 picks here cover the common cases; add to the stack only when you have a clear gap and a specific skill that fills it without overlapping what you've already got.
How do I handle a skill that produces false positives consistently?
First, check whether the false positive is something you can fix in the project — adding a comment annotation or a small refactor often makes the pattern unambiguous. If the skill is genuinely too aggressive, uninstall it and either find a replacement or write your own with tighter rules. Don't add a suppression layer; that's how real findings get lost.
Can these workflows run in CI rather than interactively?
Some can. Code review, secret detection, and dependency triage all run well in CI with the output posted as PR comments. Incident triage and timeline reconstruction are interactive by nature — they involve human judgment at each step. Don't try to fully automate the incident workflow; the value is in the structure, not the speed.

Found a bug or want a topic covered? Email [email protected] or open an issue via GitHub.