How ClaudSkills grades skill security
Every skill in the catalog ships with a free A–F security grade — a static scan of its SKILL.md against the threat patterns documented in the OWASP Agentic Skills Top 10 and the 2026 agent-skill security research. It's shown to everyone, on every skill page, no account required. Here is exactly what it checks, how the letter is computed, and — just as important — what it does not guarantee.
SKILL.md is loaded straight into your agent's context and its instructions are executed with your permissions. The agent-skill supply chain has a documented malware problem — Snyk's ToxicSkills study found prompt injection in 36% of skills on one low-barrier registry. A safety signal that sits behind a paywall would be worse than useless, so the grade is free. The Pro subscription covers the Quality Score and one-click install — evaluation and convenience, not safety.
The grade scale
Each skill starts at 100 points. Every finding deducts points by severity, and the remaining score maps to a letter. Findings inside Markdown code fences (illustrative examples) are penalised at half weight, because a dangerous pattern shown as an example is far less risky than the same instruction in the skill's prose — the prose is what the agent acts on. Low-confidence rules are halved again.
| Severity | Penalty | Examples |
|---|---|---|
| Critical | −25 | Reverse shell, env-dump piped to a URL, reading ~/.ssh and sending it out |
| High | −15 | curl … | sh, explicit jailbreak framing, destructive rm -rf |
| Medium | −8 | Base64-decode-and-run, persistence hooks, command execution |
| Low | −3 | Hardcoded IPs, miscellaneous network references |
The ten threat categories
The scan groups its rules into ten categories, drawn from the OWASP Agentic Skills Top 10 and published research like Snyk's "From SKILL.md to Shell Access in Three Lines of Markdown."
| Category | What it looks for |
|---|---|
| Prompt injection | Instruction override ("ignore previous instructions"), jailbreak framing ("developer mode"), covert-behaviour directives, unicode smuggling |
| Data exfiltration | Reading a local secret store (~/.ssh, ~/.aws, .env) and piping it to the network; full env dumps to a URL |
| Supply chain | Remote-exec pipes (curl … | sh), runtime installs from URLs, postinstall hooks |
| Reverse shell | bash -i >& /dev/tcp/…, nc -e, socat exec |
| Credentials | Reads of SSH keys, AWS credentials, .netrc, keychain dumps |
| Execution | eval/exec, os.system, subprocess(shell=True), child_process |
| Filesystem | Destructive operations (rm -rf /), path traversal, sensitive-file access |
| Persistence | Cron jobs, launchctl/systemctl, shell-rc writes |
| Obfuscation | Base64, hex/char-code encoding, zero-width characters |
| Network | Hardcoded IPs and other miscellaneous outbound references (low severity) |
What the grade means — and what it doesn't
An A grade means a static text scan found no risk patterns. It is a useful filter, not a safety guarantee. Three honest limits:
- Static analysis can't watch runtime. A skill that fetches and executes remote code at run time can hide its real behaviour from a text scan. Treat the grade as a first pass, not a clearance.
- Destination is unknowable. The scan can't tell an attacker's server from a legitimate API — so a normal authenticated API call is not flagged as exfiltration. We anchor on the secret source (reading
~/.ssh) rather than the destination, which keeps false positives down but means a novel exfil channel can slip through. - Dual-use is real.
curl … | shis the canonical supply-chain risk and the way half the world installs legitimate tools. We grade it High, not Critical, and surface it for you to judge.
The single highest-leverage thing you can do is still the oldest advice: skim the SKILL.md before you install it — look for unexplained URL fetches and environment-variable references — and run untrusted skills in an isolated environment with no production secrets.
What the catalog looks like
We ran this scan across the entire catalog. The result is reassuring: the overwhelming majority of skills carry no risk patterns at all — far cleaner than the 36% figure from the ClawHub supply-chain study. The full breakdown, the category counts, and the reproducible methodology are written up in We scanned 97,000 Claude skills for security risks →
A note on MCP servers
The grade above scans a skill's SKILL.md — the text your agent loads and acts on. The Arcade also catalogs MCP servers and dev tools, whose executed content is the server's own source code, which the catalog doesn't hold. So an MCP entry can't get the same content scan. Instead, every MCP server in the Arcade carries a structural risk profile: its execution scope — Local (stdio) (runs code on your machine with your shell environment and credentials), Container (Docker-isolated from your host), or Remote (hosted by the provider, so requests and data you send reach their infrastructure) — alongside provenance (reference/official vs. where it's published) and its license. It's an honest map of how a server runs and where it comes from — not a claim that we audited its code.
Methodology & reproducibility
The scanner is a pure static pass over each skill's SKILL.md — no network calls, no code execution. Rules are regular-expression patterns grouped by the ten categories above; each is tagged with a severity and a confidence level. Penalties are computed at the point of match, halved inside code fences and again for low-confidence rules, then summed and subtracted from 100. The grade is recomputed every time a skill's content changes, and a grade change re-renders only that skill's page. The signal is content-derived and unforgeable: there is nothing a skill author can do to influence the grade except write safer content. The grade is part of the open dataset — you can read it on every /skills/<slug>/ page and in /data/skills.json.
FAQ
- Is the security grade free?
- Yes — shown to everyone on every skill page, no account or subscription. Security is a safety signal and belongs in the open. Pro covers the Quality Score and one-click install.
- Does an A grade mean a skill is safe?
- No. An A means a static scan found no risk patterns. It can't detect every behaviour or follow runtime activity. Skim the SKILL.md and use an isolated environment for untrusted skills.
- Can an author pay to improve their grade?
- No. The grade is computed entirely from the skill's own content. There is no fee and no way to influence it other than writing safer content.
- How often is it recomputed?
- Every catalog rebuild. A grade change re-renders only the affected skill page.