How long should this checklist actually take to run?

About 15 minutes for the checks themselves on a finished draft, plus 30 minutes to fix the fails. The fresh-session and cold install tests add another 10-15 minutes each but pay back the time by catching defects that would otherwise show up in user reports.

What if my skill genuinely needs more than 200 characters in the description?

It almost certainly doesn't. Long descriptions are usually a sign the skill is doing too much, or the author hasn't found the precise framing yet. Try writing the description three different ways under 200 chars — one of them will be better than your current long one.

Is the anti-trigger section really that important?

Yes. It's the single most reliable proxy for "the author thought about scope". Skills with anti-trigger sections behave better when picked, get picked correctly more often, and get uninstalled less often. If you only fix one thing on this list, fix this.

What if I disagree with one of the checks?

Ignore that check. The list is opinionated, not normative. If you have a specific reason to break one — for example, you write skills with longer descriptions for an internal audience that values context over compression — that's fine. The list catches the common case; your skill might not be the common case.

Should I run this on existing published skills, not just new ones?

Yes, and it's often worth doing. Pick your most-used skill, run the checklist, and you'll probably find 2-3 fails. Fixing them — especially description tightness and anti-trigger sections — usually lifts how often the skill gets picked correctly.

Does the checklist work for non-SKILL.md formats like Claude Code plugins?

The frontmatter checks are SKILL.md-specific, but the body, discovery, and distribution checks apply directly to plugin manifests, agent prompts, and most other Claude Code authoring formats. Adapt the frontmatter checks to your format's metadata equivalent and the rest carries over.

Home › Learn › The Claude Code Skill Quality Checklist

The Claude Code Skill Quality Checklist

Published 1 June 2026 · 14 min read · By a long-time Claude Code practitioner

You've written a SKILL.md file. Maybe it works for you, maybe a teammate has tried it once. Before you publish to GitHub, push to a catalog, or share the link in a Discord — run it through this checklist. None of these checks require running anything more exotic than a fresh Claude Code session.

The checklist is 40+ items split across four phases: frontmatter, body, discovery, and distribution. Each item has a concrete pass/fail criterion. The goal isn't perfection — it's catching the specific defects that make otherwise-good skills get ignored, mis-triggered, or quietly skipped by Claude.

In this guide

Why this checklist exists
Frontmatter checks (10)
Body checks (12)
Discovery & invocation checks
Distribution & licence checks
Common failure patterns this catches
Fixing the most common failures
The printable checklist

Why this checklist exists

Published skills compete for Claude's attention. When a user types a request, Claude scans the descriptions of every available skill, decides whether any of them fit, and either invokes one or proceeds without help. Two things follow from that.

First, your skill is one of dozens — sometimes hundreds — of candidates per request. If the description is vague, if the frontmatter is half-filled, if the body has no examples, Claude often picks something else or nothing at all. The skill you spent two hours writing simply never fires. There's no error log. There's no notification. It just doesn't get picked.

Second, the median bar across public catalogs is genuinely low. A large fraction of published SKILL.md files have one of these problems: description over 200 characters, no Use when opener, body that's a wall of prose without an Examples section, no anti-trigger guidance (the single biggest signal that an author thought about scope), or frontmatter missing tags, license, or allowed-tools. None of these are technically broken. They just make the skill harder to discover, harder to evaluate, and easier to skip.

If you pass this checklist cleanly, you're in the top 20% of published skills. That's not a theoretical estimate — it's what you find when you read a few hundred public SKILL.md files in a row. Most fail at least three checks. A few fail ten or twelve. The handful that pass all of them are the skills people actually install and keep.

The checklist is also designed to be fast. You should be able to run it on a finished draft in 15 minutes, fix the fails in another 30, and ship a tighter skill the same afternoon. It's not a code review. It's a pre-publish smoke test.

One thing this checklist is not: a catalog scoring rubric. Different catalogs (ClaudSkills, awesome lists, internal company registries) score and rank skills differently. What this checklist does is catch the universal defects — the ones that hurt your skill regardless of where it's published. If you're aiming at a specific catalog, read their submission notes too. But run this first.

The four phases below are ordered by impact. Frontmatter defects are catastrophic — they're how Claude picks your skill in the first place. Body defects are slow leaks — they make your skill work badly when it does fire. Discovery checks are the ones authors skip most often. Distribution checks are about being a good citizen.

Frontmatter checks

The frontmatter is the first thing Claude reads and often the only thing it reads carefully. Get this right and the rest of the skill has a chance.

1. name is set, kebab-case, ≤40 characters. Pass: name: review-typescript-pr. Fail: missing field, snake_case, CamelCase, or a 60-character sentence-with-spaces. Claude uses this as the canonical identifier; long names get truncated in surfaces.

2. description starts with "Use when". This is the strongest single signal you can give. "Use when reviewing a TypeScript PR for type-safety bugs" beats "A skill for code review" by a wide margin. Claude pattern-matches on intent verbs; leading with Use when gives it a clean handle.

3. description is ≤200 characters. Pass: one sentence, maybe two short ones. Fail: a paragraph. Long descriptions get truncated in skill-picker surfaces, and the truncated tail is often the part with the actual specificity. Move detail into the body.

4. description names the artefact or input. Compare "Use when a user shares a TypeScript diff or PR URL" with "Use when reviewing code". The first tells Claude what input triggers the skill; the second is so vague it competes with every other code-review skill in the catalog.

5. tags is present and has 3-8 entries. Tags help users find the skill via tag landing pages and help Claude pattern-match secondary intent. Fewer than 3 tags is under-described; more than 8 is tag-spam and gets diluted. Mix dimensions: language (lang:typescript), purpose (type:code-review), tool (tool:github).

6. model declared if the skill needs a specific tier. If your skill genuinely needs Opus-level reasoning (deep refactors, complex planning), say so: model: opus. If it's fine on Haiku (simple lookups, formatting), say that. Skip the field if the skill works on any tier — don't pin Opus by reflex.

7. allowed-tools is narrowed appropriately. If your skill only needs to read files, don't grant Bash. If it only writes one file type, don't grant Edit across the whole tree. Narrow allowed-tools is both a security signal and a discoverability signal — Claude prefers skills whose tool surface matches the request.

8. license is noted. Even just license: MIT. Skills without a licence field create ambiguity for anyone redistributing or commercialising. If you're publishing under a permissive licence, say so. If you want copyleft, say that too. Silence is worse than either.

9. version is set if the skill has shipped before. Semantic versioning is fine: version: 1.2.0. This matters for skills that change behaviour over time — users pinning to a known-good version need a handle.

10. No frontmatter typos. YAML is silent about unknown keys. tag: instead of tags: just gets ignored, and you won't know until you wonder why your skill isn't surfacing in tag searches. Validate the YAML, eyeball the keys, and double-check spelling on the standard fields.

Body checks

Frontmatter gets you picked. The body determines whether the picked skill actually does its job. Twelve checks here, the most important ones flagged.

1. There's an ## Instructions section. Not ## How it works, not ## Usage — call it Instructions, in that exact phrasing. Claude is trained on this convention; a section labelled Instructions gets weighted as the operational core. Other section titles work but get less attention.

2. Instructions are imperative and ordered. Pass: "1. Read the diff. 2. Identify type-safety issues. 3. Group findings by severity." Fail: "This skill reads diffs and tries to identify problems by analysing types." Imperative voice tells Claude what to do; descriptive voice tells Claude what the skill is for. You want both, but instructions specifically should be imperatives.

3. There's an ## Examples section with 2+ examples. Each example shows a realistic input and the expected output shape. Skills without examples force Claude to guess at output format and frequently guess wrong. Two examples is the minimum that demonstrates a pattern; one example reads as a special case.

4. There's an anti-trigger section. This is the single biggest quality signal. Call it ## When NOT to use or ## Skip this skill if or ## Out of scope. List 3-5 cases where the skill should NOT fire. Examples: "Skip if the user is asking for a code review of a single file under 50 lines — use a manual review instead." Authors who add this section have demonstrably thought about scope. Authors who skip it usually haven't.

5. No inline secrets, API keys, or tokens. Replace with $API_KEY or <your-token>. This sounds obvious but a non-trivial number of public skills leak real keys. Once committed to a public repo, treat as burned even after deletion.

6. No PII in examples. Use fake emails ([email protected]), fake names, fake addresses. Don't use a real colleague's name even as a joke. If your skill processes user data, the examples should make the redaction expectation explicit.

7. Scope is sane. A skill that does code review, deployment, monitoring, and writes Jira tickets is doing too much. Break it up. Skills with clear single responsibility get picked correctly more often than mega-skills. If your description has multiple "and"s, you probably have multiple skills.

8. No instruction conflicts with Claude's defaults. Don't write "Always commit changes without asking" — Claude's default safety posture won't comply, and the conflict surfaces as confusing behaviour. Frame instructions as enhancements to defaults, not overrides.

9. External commands have safe-by-default flags. If your skill runs rm, scope it with a path. If it runs git push, default to --dry-run first or require explicit confirmation. The user can always opt into destructive mode; they can't opt out of a destructive default.

10. Code blocks are fenced with language tags. ```typescript not ```. Claude uses the language tag to pick the right syntax model when interpreting your examples. Untagged blocks get parsed as plain text and lose specificity.

11. The body references real tools or files by name. If your skill uses jq, say so. If it expects a package.json at repo root, say so. Generic skills that talk in abstractions ("the configuration file", "the package manager") are harder for Claude to invoke correctly.

12. No hidden surprises. If the skill makes a network call, says so. If it writes to disk outside the working tree, says so. If it depends on an external service being up, says so. Surprise behaviour is the fastest way to lose user trust.

Discovery & invocation checks

The body can be perfect and the skill can still fail at the discovery stage. Three tests, all of which take five minutes each.

Fresh-session test. Open a fresh Claude Code session — no prior context, no recent messages about your domain. Type a request that should trigger your skill, phrased the way a user would phrase it (not the way you'd phrase it). Did Claude pick your skill? Did it pick a different skill? Did it proceed without any skill?

Do this three times with three different phrasings. If Claude picks your skill 0 or 1 out of 3 times, your description isn't doing its job. The most common fix is rewriting the Use when opener to match user vocabulary, not author vocabulary. Authors say "semantic versioning bump"; users say "new version number". Use the user phrasing in the description.

Description-matches-intent test. Read your description out loud. Imagine you're a developer who has never seen your skill before. Does the description tell you, in 10 seconds, whether this skill is for your current problem?

Pass: "Use when reviewing a TypeScript PR for type-safety issues — checks for any, unsafe casts, and missing return types." In 12 seconds you know: it's for code review, it's TypeScript-specific, it's PR-shaped (not single-file), and it has three specific check types.

Fail: "A skill for analysing code with comprehensive checks." In 12 seconds you've learned nothing actionable. Which language? Which checks? Which input shape? A description like this means Claude has to guess at every dimension.

No-collision-with-other-skills test. Search the catalog you're publishing to for skills with similar names, similar tags, or similar descriptions. Is there already a skill that does what yours does?

If yes, you have three options. First, differentiate explicitly — make your description name the dimension you're different on ("Use when reviewing TypeScript PRs that touch React components specifically"). Second, supersede — if your skill is genuinely better, reach out to the original author. Third, abandon — if the existing skill is fine and yours adds nothing, don't publish yours. Tag-and-name collision is the fastest way to make both skills harder to find.

For the collision test specifically, search by the operative noun-verb pair, not by your skill's full name. If your skill is review-react-pr, search for "react pr review", "react code review", "pr review typescript". If five other skills come back and three of them have higher star counts and better descriptions, ask yourself whether the world needs your sixth.

One last discovery check: the cold install test. Install your own skill in a fresh Claude Code environment from the URL you're about to publish, not from your dev tree. Does it work? Does the install command in your README actually run? Does it leave any state behind that a user wouldn't expect? Catching install bugs before publishing is much cheaper than catching them in the issue tracker.

Distribution & licence checks

The skill works, the discovery looks clean, the body is tight. Three last checks before you publish.

Licence is obvious to a reader. Not just in the frontmatter — also as a LICENSE or LICENSE.md file at the repo root with the full text. Frontmatter license: MIT is a hint; the LICENSE file is the binding statement. Anyone redistributing your skill (catalogs, awesome lists, internal company registries) will look for the file. Without it, conservative redistributors skip your skill entirely.

If you're not sure which licence to use, MIT is the safest default for skills that want maximum reach. Apache 2.0 adds an explicit patent grant if you care. GPL variants are fine but reduce the number of places that'll redistribute. Avoid "all rights reserved" — it makes the skill effectively unredistributable, which usually isn't what you want.

Source-of-truth URL is findable. Where is the canonical version of this skill? GitHub repo? Gist? Personal site? The URL should be in the frontmatter (source: or repository:), in the README, and ideally in the skill body as a comment at the top. When a user wants to file a bug, suggest a feature, or check for updates, they should be able to find your canonical location in under 30 seconds.

Avoid the case where your skill lives in three places — a personal repo, an awesome list fork, and a forgotten gist from a year ago. Pick one canonical home. Make the others redirect or get archived.

Contact information is present. An email, a GitHub username with a public profile, a Mastodon handle, anything. This isn't for marketing — it's so people can report security issues, ask for clarification, or offer to maintain the skill if you move on. The single most common reason a high-quality skill gets forked or duplicated is the original author becoming uncontactable.

If you don't want your personal email exposed, use a project-specific address. [email protected] with a mail filter works. A GitHub username is fine if your profile is reasonably current. What you want to avoid is a skill where the only contact path is filing a public issue on a repo you haven't touched in a year.

Bonus: a SECURITY.md or equivalent. If your skill could plausibly be misused (anything that runs commands, accesses network, processes user data), include a one-paragraph note about responsible disclosure. "If you find a security issue with this skill, please email [email protected] or open a private security advisory on GitHub. Please don't file a public issue for security bugs." Skills with this are rare. Skills with this and a real response history are even rarer, and they get treated with much more trust by serious users.

Final check on distribution: tag a version. Even if it's v0.1.0. Unversioned skills can't be pinned, which means anyone depending on a specific behaviour has to vendor the whole file or hope you don't change anything. Cheap to do, valuable for users.

Common failure patterns this checklist catches

After running this checklist on a few dozen skills, you start to see the same defects recurring. Here are the five most common, each with the specific check that catches them.

The Generic Reviewer. Description: "A skill for reviewing code." Tags: code-review. Body: 800 words of prose about software quality. No examples, no anti-trigger, no specifics. Caught by: frontmatter check #4 (description names the artefact), body check #3 (Examples section), body check #4 (anti-trigger section). Fix: pick a specific code-review angle (React PRs? Rust unsafe blocks? Python type hints?) and rewrite the description and examples around that angle.

The Mega-Skill. Description that contains "and" three times. Frontmatter has 12 tags. Body is 3,000 words covering planning, execution, monitoring, and reporting. Caught by: body check #7 (sane scope). Fix: split into three or four skills, each with single responsibility. The mega-skill almost never gets picked correctly — Claude can't tell which slice of behaviour the user wants.

The Silent Surprise. Skill works in the author's environment because they have jq installed, GITHUB_TOKEN set, and a specific directory structure. Body doesn't mention any of these. Caught by: body check #11 (real tools by name), body check #12 (no hidden surprises). Fix: add a Prerequisites section listing every external tool, environment variable, and file the skill assumes.

The Default-Override. Body contains instructions like "Always commit without asking" or "Skip the confirmation prompt" or "Use force flag by default". Caught by: body check #8 (no conflict with Claude's defaults). Fix: rewrite instructions so destructive actions are explicit opt-ins from the user, never defaults from the skill.

The Orphan. No licence file, no contact info, no source URL in the frontmatter, last commit 18 months ago, the README is still {{cookiecutter.skill_name}}. Caught by: distribution checks. Fix: actually fill in the metadata. The orphan pattern is depressingly common because authors finish the interesting work (the body) and lose interest before doing the boring work (the metadata).

There's a sixth pattern worth flagging, even though no single check catches it: the aspirational skill. The body describes what the author wishes the skill could do, not what it actually does. "This skill analyses your codebase for architectural drift" — in practice it greps for a few file patterns. "This skill optimises your CI pipeline" — in practice it suggests adding a cache step.

Aspirational skills don't fail any single check, but they fail the trust test: users install them expecting the description, find the actual behaviour, and either uninstall or just stop trusting your other skills. The fix is to rewrite the description to match actual capabilities precisely, even if that makes the skill sound less impressive. Honest descriptions outperform aspirational ones by a wide margin in long-term retention.

Fixing the most common failures

Three specific rewrites that fix the top failure patterns. Each takes 10-20 minutes and lifts the skill into the passing tier.

Rewriting a vague description. Start by writing down, in plain language, the most specific user request your skill is designed for. Not "review code" but "review a TypeScript pull request that touches at least one React component and flag any uses of any or unsafe casts". That's your target.

Now compress that target into the Use when frame. "Use when reviewing a TypeScript PR with React component changes for type-safety issues." 75 characters, under the 200 limit, names the language, names the artefact (PR), names the scope (React components), names the purpose (type-safety). Compare against your current description. If yours is vaguer, swap it.

The hard part is killing your darlings. You'll have phrases in your current description that you're attached to — adjectives like "comprehensive", framings like "a powerful skill for". Cut all of them. Every adjective that doesn't change the trigger condition is making the description worse.

Adding an anti-trigger section. Open a fresh document and brainstorm 5-10 cases where you would NOT want your skill to fire. Be specific. "Skip if the PR is a docs-only change" beats "Skip if not appropriate". "Skip if the user has already asked for a manual review" beats "Skip if user wants something different".

Once you have 5-10 candidates, pick the 3-5 most likely false-positive triggers and write them up as bullets in a ## When NOT to use section. Each bullet should be one sentence and name a specific scenario, not a category.

If you can't think of cases where the skill shouldn't fire, that's a signal your scope is too broad — go back to body check #7 and consider splitting. A skill with no plausible anti-triggers is doing too much.

Tightening allowed-tools. Audit your current allowed-tools list. For each tool, ask: "Does this skill, in its happy path, actually use this tool?" If the answer is no or only sometimes, consider removing it.

A skill that reads files and produces a report doesn't need Bash. A skill that runs a single specific command (say, npm test) can use a narrow Bash rather than the full surface. A skill that writes one file doesn't need Edit across the whole tree — it can declare just the specific file or directory it touches.

The trade-off is that narrow tool surfaces sometimes need expansion when you add new features. That's fine. Bump the version, expand the surface intentionally, update the description if the new capability changes the trigger profile. Users prefer skills whose tool footprint matches what the skill actually does.

One last note on rewrites: don't rewrite for the checklist. Rewrite for the user. The checklist is a proxy for "does this skill respect the user's time and attention". If your rewrites pass the checklist but feel wrong for actual users, trust the user feeling and ignore the checklist for that specific item. The checklist is opinionated, not infallible.

The printable checklist

Copy this into your editor, your README, your team's skill-publishing template — wherever it'll get used. Each line is a yes/no question. If you can't answer yes, the skill isn't ready to publish.

# Pre-publish skill checklist

## Frontmatter (10)
[ ] name set, kebab-case, ≤40 chars
[ ] description starts with "Use when"
[ ] description ≤200 chars
[ ] description names the artefact or input
[ ] tags present, 3-8 entries, mixed dimensions
[ ] model declared (if needed) or omitted (if any)
[ ] allowed-tools narrowed to what skill actually uses
[ ] license noted in frontmatter
[ ] version set (if skill has shipped before)
[ ] no typos in frontmatter keys

## Body (12)
[ ] Instructions section labelled exactly "## Instructions"
[ ] Instructions are imperative and ordered
[ ] Examples section with 2+ realistic examples
[ ] Anti-trigger section (When NOT to use / Skip if / Out of scope)
[ ] No inline secrets, API keys, or tokens
[ ] No PII in examples
[ ] Single responsibility — no "and...and...and"
[ ] No instruction conflicts with Claude's defaults
[ ] External commands have safe-by-default flags
[ ] Code blocks have language tags (```typescript not ```)
[ ] Real tools and files named explicitly
[ ] No hidden surprises (network, disk writes, external services)

## Discovery (3)
[ ] Fresh-session test: 2+ of 3 trigger phrasings pick this skill
[ ] Description-matches-intent test: 10-second reader gets it
[ ] No-collision test: differentiated from existing catalog skills
[ ] Bonus: cold install test from public URL works

## Distribution (5)
[ ] LICENSE file at repo root with full text
[ ] Source-of-truth URL in frontmatter and README
[ ] Contact information findable (email, GH profile, etc.)
[ ] Version tagged (even v0.1.0)
[ ] Bonus: SECURITY.md if skill could plausibly be misused

Run this in 15 minutes on any finished draft. Fix the fails, ship the tighter skill. If you're publishing a lot of skills (an internal company catalog, a personal skill library, a team workflow), turn this into a PR template or a CI check. The discovery tests don't automate, but the frontmatter and body checks largely do.

The checklist is deliberately strict in places where the median public skill is loose — particularly the anti-trigger section, the 200-character description cap, and the requirement for a real Examples section. These are the levers that move skills from "competent but skippable" to "genuinely picked and used". Authors who land all three see noticeable lifts in how often their skills fire correctly.

If you're shipping your first skill, run this checklist twice: once on the draft, once a day later after sleeping on it. The second pass catches things the first one misses — particularly the aspirational-description trap and the scope-creep trap, both of which are easier to see when you're not still excited about the skill.

And one closing thought: this checklist will get stricter over time as the median skill quality rises. Anti-trigger sections used to be rare; now they're table stakes for top-tier skills. The next thing to become table stakes will probably be explicit input-schema declarations, or version-compatibility matrices, or something else that's currently a power-user nicety. Pay attention to what the best skills you install do that the median ones don't — that's your roadmap for the next version of this list.

Frequently asked questions

How long should this checklist actually take to run?: About 15 minutes for the checks themselves on a finished draft, plus 30 minutes to fix the fails. The fresh-session and cold install tests add another 10-15 minutes each but pay back the time by catching defects that would otherwise show up in user reports.
What if my skill genuinely needs more than 200 characters in the description?: It almost certainly doesn't. Long descriptions are usually a sign the skill is doing too much, or the author hasn't found the precise framing yet. Try writing the description three different ways under 200 chars — one of them will be better than your current long one.
Is the anti-trigger section really that important?: Yes. It's the single most reliable proxy for "the author thought about scope". Skills with anti-trigger sections behave better when picked, get picked correctly more often, and get uninstalled less often. If you only fix one thing on this list, fix this.
What if I disagree with one of the checks?: Ignore that check. The list is opinionated, not normative. If you have a specific reason to break one — for example, you write skills with longer descriptions for an internal audience that values context over compression — that's fine. The list catches the common case; your skill might not be the common case.
Should I run this on existing published skills, not just new ones?: Yes, and it's often worth doing. Pick your most-used skill, run the checklist, and you'll probably find 2-3 fails. Fixing them — especially description tightness and anti-trigger sections — usually lifts how often the skill gets picked correctly.
Does the checklist work for non-SKILL.md formats like Claude Code plugins?: The frontmatter checks are SKILL.md-specific, but the body, discovery, and distribution checks apply directly to plugin manifests, agent prompts, and most other Claude Code authoring formats. Adapt the frontmatter checks to your format's metadata equivalent and the rest carries over.
What's the single biggest mistake authors make?: Vague descriptions that don't name the input artefact. "A skill for reviewing code" loses to "Use when reviewing a TypeScript PR with React component changes" every single time. Naming the artefact is the cheapest, highest-impact rewrite available.

Found a bug or want a topic covered? Email [email protected] or open an issue via GitHub.

The Claude Code Skill Quality Checklist

Why this checklist exists

Frontmatter checks

Body checks

Discovery & invocation checks

Distribution & licence checks

Common failure patterns this checklist catches

Fixing the most common failures

The printable checklist

Frequently asked questions

Categories

Use cases

Popular tags

Learn

Site