Claude Code Skills·Claude Skills·The open SKILL.md registry for Claude
HomeLearn › Debugging a Claude Code Skill When Claude Won't Use It

Debugging a Claude Code Skill When Claude Won't Use It

Published 1 June 2026 · 12 min read · By a long-time Claude Code practitioner

You wrote a skill. You dropped it in ~/.claude/skills/. You restarted Claude Code. And then nothing happened — Claude either ignored the skill entirely, fired the wrong one, or invoked yours but failed halfway through. This guide walks through every layer where things break, in roughly the order they fail in practice, with the diagnostic commands and edits I actually run when one of my own skills misbehaves.

By the end, you will have a five-minute checklist you can run against any silent skill, plus a mental model of why Claude makes the decisions it does — which matters more than memorising fixes, because the fix is almost always "change the description" and the hard part is knowing which words to change.

In this guide

The discovery and loading layer

Before Claude can decide whether to use your skill, the loader has to find it, parse it, and accept it. This layer fails silently more often than any other, because Claude Code doesn't print a noisy error when a skill fails to load — it just acts as if the file isn't there.

The canonical user-global path is ~/.claude/skills/<slug>/SKILL.md. The slug is the directory name; the file is always called SKILL.md with that exact capitalisation. A common first mistake is putting the skill at ~/.claude/skills/<slug>.md directly, without the wrapping folder. That file is never picked up, regardless of its contents.

Verify the file is in the right place and parses cleanly:

ls -la ~/.claude/skills/my-skill/
# expect: SKILL.md, possibly a README, optional reference files

head -20 ~/.claude/skills/my-skill/SKILL.md
# expect: --- on line 1, then YAML, then --- closing the frontmatter

The frontmatter is YAML between two --- fences. At minimum it needs name: and description:, both non-empty strings. If name: is missing or YAML-invalid, the loader drops the skill. The most common subtle break is an unquoted description containing a colon — YAML reads description: For X: when you need Y as a nested mapping and the parse fails. Wrap descriptions with internal punctuation in single quotes.

Watch for a UTF-8 byte-order mark at the start of the file. If you edited the SKILL.md in a Windows editor or saved it via a script that injected a BOM, the first three bytes of the file are EF BB BF before the opening ---, and the YAML parser sees a malformed first line. Run file ~/.claude/skills/my-skill/SKILL.md — if it reports "UTF-8 Unicode (with BOM) text", strip it with sed -i '' '1s/^\xEF\xBB\xBF//' SKILL.md on macOS or dos2unix on Linux.

The fastest end-to-end check that loading actually succeeded is to ask Claude directly. Open Claude Code and type something like:

List every skill you currently have access to, with the description of each. Don't invoke any of them — just enumerate.

If your skill is in the list, the discovery layer is fine and you can move on. If it isn't, the file isn't being parsed — go back and check the path, the YAML, and the BOM. Don't proceed to the matching layer until your skill shows up in that enumeration; you'd be debugging an invisible skill.

The matching layer: description-as-trigger

Once a skill is loaded, Claude decides whether to invoke it based almost entirely on the description: field. The body of the SKILL.md is loaded into context only after the skill is selected; for the selection itself, only the frontmatter matters. This is the single biggest insight people miss when writing skills.

If Claude isn't invoking your skill on a request you think should match, the description is almost certainly the problem. It is either too narrow (Claude sees no overlap with the user's phrasing), too broad (Claude has no reason to prefer it over its own native reasoning), or too vague (Claude can't tell what the skill actually does).

Here is a real before/after from a deployment-automation skill that wasn't firing:

description: Deploy script helper

That description matches almost nothing. "Deploy" is a verb that appears in dozens of contexts, "script helper" doesn't tell Claude what kind of work the skill performs, and there's no signal about when the skill is appropriate. After rewriting:

description: Use when the user wants to deploy a Node.js or Python app to
  Fly.io, Railway, or Render. Generates the platform-specific config file,
  validates required env vars, and runs the deploy command. Skip for AWS,
  GCP, Azure, Vercel, or Netlify deployments.

That version has four properties the first version lacked: it names the user intent (deploy a Node.js or Python app), names the specific platforms it supports, describes what the skill actually does, and — critically — explicitly lists what it does not handle. The exclusion clause is what stops Claude from invoking it on the wrong job.

A useful framing: the description is a job posting Claude reads when deciding whether to hire your skill. Vague job postings get ignored or attract the wrong candidates. Specific postings with clear exclusions get the right work routed to them. Aim for 30 to 80 words. Lead with the trigger condition, then the deliverable, then exclusions.

One pattern that consistently fails: descriptions that describe the skill's internals rather than its use case. description: A skill that uses regex and AST parsing to refactor code tells Claude how the skill works but gives no signal about when to invoke it. Rewrite as description: Use when the user wants to rename a symbol across a TypeScript codebase — same skill, but now Claude can match it against user requests.

When Claude picks the wrong skill

The opposite failure mode: Claude does invoke a skill, but it's not the one you wanted. This usually means two skills have overlapping descriptions, and Claude picked the one whose description happens to match the user's wording slightly better — or worse, picked one essentially at random and never even considered the other.

Diagnose this by asking Claude to walk through its reasoning before invoking anything:

I'm going to ask you to do something. Before you act, tell me which skills you considered, which one you'd pick, and why. Don't invoke anything yet.

Then make the request that should have triggered your skill. Claude's reasoning will reveal whether it saw your skill at all, whether it considered it but rejected it, or whether it picked a competing skill. Three common resolutions, in order of preference:

Narrow the descriptions of both skills. If you have refactor-typescript and refactor-python, both with descriptions starting "Use when the user wants to refactor code", they will fight every refactor request. Rewrite them to lead with the language: description: Use when the user wants to refactor a TypeScript or JavaScript file. The shared prefix is the problem; differentiate at the start of the description, not buried at the end.

Add anti-trigger sections to the body. Even though Claude only reads frontmatter for the initial decision, the body kicks in once the skill is invoked. A short "When NOT to use this skill" section near the top of the body lets the invoked skill bail out early if it realises the wrong one was selected:

## When NOT to use this skill

Skip this skill and use `refactor-python` instead if:
- The file extension is .py or the imports start with `import` (without `from`)
- The user mentions Python, pip, Django, Flask, or FastAPI
- The code uses snake_case for function names

This costs almost nothing in tokens but turns wrong-skill invocations into clean handoffs.

Merge the skills. Sometimes two skills are really one skill with a branch. If refactor-typescript and refactor-python share 80 percent of their logic, write a single refactor-code skill whose body detects the language and dispatches internally. One description, no overlap, no ambiguity for Claude to navigate.

A symptom that often indicates the wrong-skill problem rather than the no-skill problem: Claude invokes a skill, the user reports the result is wrong, and you look at the logs and find a different skill name in the trace than you expected. That's not the skill misbehaving — that's the matching layer picking the wrong tool.

The allowed-tools failure

If your skill needs to read files, run shell commands, or call a sub-agent, the tools it can use are governed by the allowed-tools: field in the frontmatter. Get this wrong and the skill loads, fires, and then fails mid-execution with errors that look like the skill is broken when really the permission is missing.

The default behaviour depends on your Claude Code configuration, but the safe assumption is that a skill with no allowed-tools: field gets no tools at all beyond plain text generation. If your SKILL.md body says "Run npm test and report the results" but the frontmatter doesn't grant Bash access, the skill will produce a response describing the command without actually running it, and the user will get a hallucinated test output.

The frontmatter syntax accepts a YAML list:

---
name: ci-doctor
description: Use when CI is failing and the user wants to diagnose why.
  Reads the CI config, runs the failing command locally, and proposes a fix.
allowed-tools:
  - Bash
  - Read
  - Grep
  - Edit
---

Common mistakes I see repeatedly: typoed tool names (bash in lowercase doesn't match Bash), tools listed as a comma-separated string instead of a YAML list, and missing Edit when the skill is supposed to modify files. The last one is particularly nasty because the skill will read the file, draft the change, then silently skip the write step and report success as if it had edited.

The diagnostic when a skill seems to half-work: ask Claude what tools the currently-running skill has access to.

You just invoked the ci-doctor skill. Without running anything, list every tool that skill is permitted to call right now.

If the list is missing a tool the skill obviously needs, the frontmatter is wrong. If the list looks right but the skill still won't perform an action, you have a different problem — usually a prompt-side issue where the body is asking Claude to do something it doesn't recognise as a tool call.

Be conservative with allowed-tools:. Granting Bash to a skill that only needs Read is the most common over-permission, and it widens the blast radius if the skill misfires. A documentation-summarisation skill should not be able to run rm -rf. Audit your most-installed skills periodically and prune tools they don't actually use.

One subtle gotcha: if your skill's body invokes a sub-agent via the Task tool, the sub-agent inherits the skill's allowed-tools by default. If you want the sub-agent restricted to a subset, you have to specify that in the Task call itself — the frontmatter is not enough.

The model-mismatch failure

Some skills include a model: pin in their frontmatter, locking them to a specific Claude model. This is sometimes necessary (the skill was tuned against a particular model's capabilities) and sometimes accidental (the author copied a template and forgot to remove or update the pin).

A skill pinned to a model the user doesn't have access to, or to an older model that has been retired, will either fail to invoke or invoke into an empty response. The error messages are not always obvious — Claude Code may silently fall back to the default model, run the skill, and produce subtly worse output than the author intended. Or it may refuse to run the skill at all and the user has no idea why.

Check whether your skill pins a model:

grep -E '^model:' ~/.claude/skills/my-skill/SKILL.md

If the pin names a model that has been deprecated, update it. If the pin is unnecessary — the skill works fine on whatever the user's default is — remove it. The general rule: only pin a model when the skill demonstrably requires the capabilities of a specific tier (long context, vision, extended thinking) and would silently degrade without them.

A subtler version of this failure: your skill behaves differently for different users because their default models are different. A skill that relies on long-context reasoning will work fine when the user is on a flagship model and disappoint on a smaller one. If you can't pin (because you want broad compatibility) and you can't gracefully degrade (because the skill genuinely needs the capability), document the requirement clearly in the body:

## Requirements

This skill expects extended thinking to be enabled and a model with at least
200K context. On smaller models it will produce a summary instead of a full
refactor — that is a feature, not a bug.

Setting expectations in the body prevents the "it worked yesterday, it doesn't work today" support thread when the user switches models or hits a context boundary.

One pattern worth adopting: if your skill takes a model-dependent code path, branch explicitly in the body rather than letting the model implicitly degrade. "If the codebase is over 50K tokens, summarise modules individually before composing the refactor" gives the smaller-model run a sensible path; leaving it implicit means the smaller model just truncates and produces broken output.

The body-too-long failure

SKILL.md bodies are part of the prompt context when the skill fires. A 5,000-word body consumes meaningful context window, and when Claude is already partway through a long conversation, loading a giant skill can push the active turn over the model's limit. The visible failure mode is the skill working perfectly on a fresh chat and falling apart mid-session — same skill, same request, different conversation state.

Practical word budgets, calibrated against real catalog data:

The pattern for splitting a long skill: keep the SKILL.md focused on the entry-point logic and the decision tree, and move reference material (error code tables, configuration examples, edge-case catalogues) into sibling files in the skill directory. The body can then say Read reference/error-codes.md for the full list of error mappings, and Claude loads that file lazily only when it's needed.

The diagnostic for suspected truncation: ask Claude to repeat back the last paragraph of the SKILL.md body. If it can't, or it paraphrases, or it confabulates content that isn't in the file, you are past the loading boundary. Cut the body or split it.

Watch for accidental bloat from copy-pasted examples. A skill that includes three full example outputs of 800 words each is carrying 2,400 words of arguably-useful illustration. Trim those to representative excerpts and trust Claude to extrapolate. The catalog data shows that high-performing skills almost never include multiple long verbatim examples — they show one short example, explain the principle, and stop.

One last note on bodies: leading with the procedure rather than the rationale. Readers (and Claude, when it reloads the skill mid-session) want to see what to do in the first 200 words. Save the "why this skill exists" preamble for the bottom, if you include it at all.

Project-local vs user-global

The classic frustration: the skill works for you, fails for a teammate. Almost always, this is project-local versus user-global state drift. Claude Code reads skills from two locations: ~/.claude/skills/ for the user globally, and ./.claude/skills/ for the current project. Project-local skills override user-global ones with the same name.

Check what's actually being loaded in the failing environment:

ls -la ~/.claude/skills/my-skill/ ./.claude/skills/my-skill/ 2>/dev/null
# Look at modification times and sizes. If both exist with different
# content, the project-local one wins.

Common drift causes: you updated the skill globally on your machine but never committed the project-local copy in the repo; or your teammate committed a project-local version that shadows everyone's global copy with a stale variant; or the skill depends on a sibling reference file that exists in your ~/.claude/skills/my-skill/ directory but isn't in the repo's ./.claude/skills/my-skill/ directory.

The fix that prevents this class of bug: treat project-local skills as committable artefacts, not local conveniences. If a skill is project-specific, it should live in the repo, be reviewed in pull requests, and ship as part of the project. If it's user-specific, it should never have a project-local copy at all.

A related failure: environment variables and file paths that exist on your machine and not the teammate's. A skill that says Run /Users/alice/scripts/build.sh in the body will fail for anyone whose username isn't alice. Either parameterise the path (the body asks the user for the path on first run) or use a project-relative reference (./scripts/build.sh works for everyone who clones the repo).

One thing worth checking when a teammate reports the skill not loading at all: their Claude Code version. The skill loader has evolved, and behaviour around tool naming, frontmatter strictness, and reference-file loading has changed across versions. claude --version on both machines is a five-second check that occasionally reveals the whole problem. If versions match and behaviour still differs, suspect cached state — ~/.claude/state/ sometimes holds stale entries that a restart of Claude Code clears.

For shared skills you want everyone on a team to have the same version of, publish them through a catalog or commit them to a shared repo with a clear install procedure documented in the README. Don't rely on Slack messages saying "copy this SKILL.md into your skills folder" — that's how drift starts.

The five-minute diagnostic

When a skill is misbehaving and you don't know which layer is broken, run this checklist in order. Each step takes under a minute, and they're ordered so that earlier steps catch the more common failures.

  1. Confirm the file is where you think it is. ls -la ~/.claude/skills/<slug>/SKILL.md on the global path, and ls -la ./.claude/skills/<slug>/SKILL.md from the project root. If the project-local version exists, that's what's loading.
  2. Confirm the frontmatter parses. head -20 the file. The first line should be ---, the YAML should have non-empty name: and description:, and the closing --- should be present. Check for a BOM with file.
  3. Confirm Claude sees it. Ask Claude to enumerate every loaded skill. If yours isn't listed, the loader rejected it — go back to step 2 and recheck the YAML.
  4. Test the matching layer. Make a request that should trigger the skill, and ask Claude before it acts which skill it would pick and why. If Claude doesn't mention yours, the description is the problem — rewrite it to be more specific about the trigger condition.
  5. Test the wrong-skill case. Same request as step 4, but now check whether a competing skill is winning. If so, narrow both descriptions or add an anti-trigger clause to the body of the winner.
  6. Check allowed-tools. If the skill invokes but fails mid-execution, ask Claude which tools the running skill has access to, and compare against what the body needs.
  7. Check the model pin. grep model: on the frontmatter. If pinned to a model the user doesn't have, that's the bug.
  8. Check body length. wc -w on the SKILL.md. If it's over 3,000 words, suspect truncation in long sessions and split into reference files.

If all eight steps pass and the skill still misbehaves, the problem is almost certainly in the body's logic rather than the wiring — and at that point, you debug it the way you'd debug any prompt: shrink it to a minimal failing case, change one thing at a time, and see what moves.

The cross-references on this page are organised the same way: /learn/writing-a-skill-md-file/ covers authoring from scratch, /learn/installing-claude-code-skills/ covers the install and load mechanics, and /learn/claude-code-skill-quality-checklist/ covers the pre-publication review pass. If you've reached this guide because something broke, start with the checklist above; if you're writing a new skill, start with the authoring guide and run the quality checklist before you ship.

Frequently asked questions

Why does Claude ignore my skill even though the file is in the right place?
If the file is in the right place and the frontmatter parses, the cause is almost always the description. Claude uses the description as the trigger signal; if it's vague, generic, or doesn't mention the user-side intent in words that match the request, Claude has no reason to invoke the skill.
How do I check whether Claude actually loaded my skill?
Ask Claude directly to enumerate every skill it currently has access to, with descriptions. If yours appears in the list, the discovery layer is working. If not, the loader rejected the file — usually a YAML error, missing required field, or a stray BOM.
Two of my skills do similar things and Claude keeps picking the wrong one. What do I do?
Narrow both descriptions so they don't share a generic prefix, and add a short 'When NOT to use this skill' section near the top of each body. If the skills are 80 percent the same logic, consider merging them into a single skill that branches internally.
My skill works on a fresh conversation but fails mid-session. Why?
Almost certainly body-length truncation. As the conversation grows, loading a long SKILL.md pushes the active turn over the model's context limit. Trim the body under 1,500 words or split reference material into sibling files the skill loads on demand.
Do I need allowed-tools in the frontmatter?
If your skill needs to read files, run commands, or edit code, yes. A skill without allowed-tools is restricted to plain text generation, so a body that says 'run npm test' will produce hallucinated output instead of actually running the command. List only the tools the skill genuinely uses.
Why does my skill behave differently for a teammate?
Project-local skills override user-global ones with the same name. If you updated the global copy and never committed the project-local version, your teammate is running the stale one. Treat project-local skills as committable artefacts that ship with the repo.
Should I pin my skill to a specific Claude model?
Only when the skill demonstrably requires the capabilities of a specific tier — long context, vision, extended thinking — and would silently degrade without them. Pins for other reasons cause the skill to fail or misbehave for users who don't have access to that model.

Found a bug or want a topic covered? Email [email protected] or open an issue via GitHub.