The dirty secret of every open catalog — including this one — is that most entries shouldn't be installed. Not because the authors are careless, but because the bar for "a skill exists" is far lower than the bar for "a skill is worth invoking inside a Claude Code session."
This piece is the honest tour: the median problem nobody talks about, the five failure modes I see again and again when I audit skills before installing them, the patterns that distinguish the keepers, and a 90-second test you can run yourself. It's pointed in places. It's also the post I wish someone had written before I installed my first forty skills and quietly uninstalled thirty of them.
Every open catalog of Claude Code skills — the registry-style sites, the awesome-list GitHub repos, the curated indexes, all of them — operates under the same implicit promise: here is the universe of available skills, browse freely. What the promise doesn't say is that the universe of available skills includes a lot of work that shouldn't have been published, was published anyway, and now sits in the listing looking exactly like the genuinely useful entries beside it.
This isn't a criticism of catalog operators. The job of an open catalog is to ingest what's out there. Curation is a different job, and the two compete: heavy curation shrinks the catalog and loses entries that some users would have wanted; light curation keeps everything and ships you a lot of noise to filter.
The reason this matters for you, the person about to type an install command, is that the catalog can't make the judgement call for you. Whether a particular skill is worth running inside your session depends on what you're building, what your team's tolerance for half-finished tooling is, and whether you trust the author. A catalog can sort by signals — recency, structure, frontmatter completeness, repo activity — but it can't tell you that the skill you're about to install will, on its third invocation, decide that what your codebase really needs is to be migrated to a different testing framework.
So the burden of the final filter sits with the reader. The catalog gets you to a shortlist; you decide what crosses the threshold into your ~/.claude/skills/ directory. The rest of this piece is a working practitioner's view of where the noise comes from, how to spot it fast, and what the genuine keepers look like once you've trained your eye.
One quick note on tone before we keep going. I'm going to be direct about failure modes. I'm not going to name specific skills or authors, because the failure modes are structural, not personal. The same author who shipped a one-paragraph stub last month might ship a brilliant, narrowly-scoped tool next month. The patterns are what's worth learning. The author roster shifts every week.
When you sample a few hundred random skills from any large open catalog, a few things become obvious very quickly. The mean skill is shorter than people assume — often a few paragraphs of description and a fenced shell command. The scope is usually vague: a skill called code-reviewer that turns out to do everything from style nits to architectural feedback, with no guidance on when to invoke it versus when to leave Claude alone. The testing is almost always absent. There is rarely an explicit list of cases where the skill should not fire.
None of this makes the median skill unusable. It does make it unreliable. The unreliable skill is the worst category to have installed, because it loads on every relevant trigger word, behaves inconsistently, and slowly erodes your trust in the whole skill system. Skills that never fire are easy to remove. Skills that fire when you didn't want them, do something nearly-right, and produce output you have to read carefully to evaluate — those are the ones that cost you real time.
The median problem also has a self-reinforcing dynamic. Authors who treat skill publishing as a portfolio activity ship many small skills, because volume looks better than depth on a contributor page. Authors who treat it as craft ship fewer skills, more slowly, and write them carefully. Both end up in the same catalog. The first group is over-represented in raw counts; the second group is over-represented in skills you'd actually want to install.
This is why the brute approach of "install everything in a category and see what sticks" works badly. You'll spend hours triaging things that should never have made it onto your machine in the first place. The smarter approach is to start sceptical, raise the bar deliberately, and let the small number of skills that clear it earn their place. A good rule of thumb: if you couldn't explain to a colleague in one sentence what a skill does and when it fires, don't install it yet.
The catalog is doing its job by showing you the full inventory. Your job is to remember that the full inventory is not the same thing as a recommendation.
After enough audits, the failures cluster into a small number of recurring shapes. Five of them cover almost everything I reject.
Over-broad descriptions. The skill's description: frontmatter field is what Claude reads to decide whether to invoke. A description like "helps with code review" will fire on practically any session that touches a pull request, including ones where the user just wanted a quick syntax check. The skills I keep have descriptions that read like clauses in a contract: "use when the user pastes a unified diff and asks for review feedback; do not use for single-file linting or commit-message review." Specificity is a feature, not a flaw.
No anti-trigger discipline. A skill that says only what it does has half the instructions it needs. The other half is what it should not do, what it should defer to a different skill for, and what it should hand back to the user. Skills without an explicit "When NOT to use" section will over-fire. This is one of the most reliable signals of a careful author.
Instructions that fight Claude's defaults. Some skills read as if the author was annoyed at Claude's baseline behaviour and tried to override it through aggressive prompting — "never apologise, never explain, always output JSON, never add commentary." This works for the first few turns and then breaks in interesting ways. The skills I trust extend Claude's defaults rather than fighting them. If a skill needs to suppress Claude's normal explanations, the author should explain why, not just demand it.
Allowed-tools that grant write access without need. A skill that lists Write, Edit, Bash in its allowed-tools: when its actual job is read-only analysis is a yellow flag. The author either didn't think carefully about the minimum required permissions, or wanted the option to mutate things and didn't tell you. Either way, you don't want it loaded by default. The good skills request the smallest tool set that makes the job possible.
Copy-paste forks with no real differentiation. Some skills are visibly cloned from a small number of popular templates with the name swapped and a few words rewritten. They tend to share the same scaffold, the same examples, sometimes the same typos. If a skill reads like it could be any of fifty other skills, it probably is. The catalog shows them all; you don't need to install them all.
If you see two or more of these in a single skill, that's usually enough to skip it. One on its own is worth a closer look — sometimes there's a good reason, often there isn't.
Inverting the failure modes gets you most of the way there, but the genuinely strong skills share three positive patterns that are worth naming directly.
Narrow scope, defended explicitly. The best skills do one thing in a domain where that one thing is genuinely hard. A skill that converts OpenAPI specs to typed client code is a good shape. A skill that "helps with backend development" is not. The narrow ones tell you in the first paragraph what they refuse to do — not as a disclaimer, but as a positioning statement. "This is not a general-purpose linter. It will not rewrite your import order. It will not enforce style. It diagnoses a specific class of N+1 query antipattern in Django ORM code, and it tells you the line numbers." A skill that talks like that is usually written by someone who has actually used it under pressure.
Pricing, quota, and external-service honesty. If the skill calls an external API, the good ones tell you upfront: which provider, which endpoint, the rough cost of a typical invocation, what happens when the quota is exhausted, what happens when the provider returns 429. The skills that hide this usually hide it because they didn't think about it. The first time you hit a rate limit on a skill that doesn't gracefully degrade, you'll wish the author had been more explicit.
A worked example or two, in the SKILL.md. Not a marketing screenshot — an actual transcript or a fenced code block showing input and output. Skills that ship with a small examples/ section are taking the trouble to show you what the skill is meant to do, which is a strong indicator that the author has actually tested it. Here's the kind of frontmatter and structure I look for as a positive signal:
---
name: openapi-to-typed-client
description: |
Use when the user provides an OpenAPI 3.x spec and asks for a typed
TypeScript client. Do NOT use for client generation from non-OpenAPI
sources, for runtime validation library generation, or for spec
authoring/editing tasks.
allowed-tools: [Read, Write]
model: claude-sonnet-4-5
version: 1.4.0
license: MIT
---That frontmatter alone gives you scope, defaults, anti-triggers, permissions, and version discipline. Pair it with a few hundred words of focused prose and one or two real input/output examples, and you have a skill that earns its place on your machine.
You won't find dozens of skills like this in any catalog. You'll find tens. The good news is that tens is plenty — most working practitioners use a handful of skills heavily and a long tail occasionally. Find your handful, keep them current, and ignore the rest until you have a specific use case.
Here is the structural tension that any open catalog operator has to navigate, and that you as a reader should be aware of when you browse one.
More entries make a catalog look bigger. Bigger catalogs feel more authoritative, get linked to more often, and rank better in the places people go to find skills. The incentive, at every level, is to admit more rather than fewer.
Fewer entries — or at least, sharper filtering between the keepers and the noise — make a catalog more useful. A reader who lands on a catalog and finds the first ten entries they browse are all worth installing learns to trust the catalog. A reader whose first ten installs include two genuinely useful skills and eight that get uninstalled within a week learns to be sceptical of the whole listing, not just the eight that didn't work out.
These two pressures pull against each other. The honest way to handle them is to publish the full inventory and give the reader the tools to do the second-level filtering themselves. Sortable signals, frontmatter previews, source attribution, last-updated dates, repo-activity indicators, anti-trigger highlights — anything that helps the reader spot the keepers without having to install ten skills to find the one. None of this replaces your judgement, but it shortens the time you spend exercising it.
What you should be wary of, in any catalog (this one included), is the implicit framing that catalog inclusion is itself a recommendation. It isn't. Inclusion means "the catalog's miner found this skill, parsed it, and didn't reject it at admission." It does not mean "the catalog operator vouches for this skill in your specific use case." The catalog is the bookstore; you still have to pick the book.
The corollary is that if a catalog appears to vouch for everything — if every entry has a polished landing page, a high-sounding description, and no honest signals about quality variance — you should trust the catalog less, not more. The catalogs worth using are the ones that admit, somewhere visible, that not every entry is for everyone, and that surface the signals you'd need to tell.
This catalog tries to be in the second group. The median problem still applies. Browse with that in mind, and you'll get more out of it.
Here is the filter I run on every skill before I install it. It takes about 90 seconds and catches the vast majority of skills that wouldn't have earned their place. You don't need a special tool for this; you need the skill's SKILL.md file open in front of you.
description: field, out loud if you can. Can you state, in one sentence, when this skill should fire? If the description is generic ("helps with X"), or so long that you can't summarise it, skip. Good descriptions are surgical.allowed-tools: against the skill's actual job. A skill that reads code and reports findings shouldn't need Write. A skill that runs tests doesn't need Bash with sudo. If the permissions feel wider than the description warrants, ask why — and if there's no answer in the SKILL.md, skip.curl into bash for a skill install is over the line. Skill installs should be file copies, full stop.Run this filter and you'll skip eight out of ten skills you would have otherwise installed. The two that pass will be better than the average of the ten you would have installed without it. Over a year, the time saved by not having to uninstall the eight is more than the time spent running the filter.
If you want a longer, structured version of the same exercise, the Claude Code skill quality checklist goes through the same material at more depth, with the criteria spelled out as a yes/no walkthrough.
Given everything above, you might reasonably ask: would a walled garden be better? An invitation-only, heavily-curated list of skills vetted by humans before publication?
For some users, yes. For the ecosystem as a whole, no — and the reasons are worth being honest about.
A walled garden's curation bottleneck is also its choke point. Whoever decides what gets admitted shapes what the user base believes a skill is, what good skills look like, and which problem spaces are legitimate. That shaping power is large, and it tends to entrench whatever the curator's first cohort believed was best practice. New idioms, new tooling integrations, and skills that solve problems the curator doesn't personally have get filtered out — not maliciously, just because the curator doesn't recognise them as worth admitting.
Open catalogs make a different trade. They admit too much, the median is mediocre, and the reader has to filter. In exchange, the catalog reflects what the community is actually building, not what one curator's taste says should be built. The interesting new patterns — the skills that turn out to be excellent in retrospect — show up first in open catalogs, often months before any walled garden would have noticed them. The cost is the filtering burden you carry as a reader. The benefit is that you see the full surface of what's possible.
The other defence of the open model is correction speed. When a walled garden makes a bad admission decision, that skill is endorsed for as long as the curator hasn't gotten around to revisiting it. When an open catalog publishes a bad skill, every reader's filter catches it independently, and the bad skill gets quietly ignored within weeks. The error correction is distributed, which means it's faster than any one curator could manage.
The pragmatic stance is to use both. Open catalogs are where you discover. Curated lists — including editorial picks within an open catalog — are where you save time when you have a specific problem and don't want to filter from scratch. Neither is the right answer alone. The combination is what gets you to a skill set that's broad enough to be useful and selective enough to stay reliable.
The takeaway: don't avoid open catalogs because of the median problem. Use them with the filtering muscle that the median problem demands. The skill of reading skill metadata sceptically is, itself, one of the most valuable things you can learn this year.
If you publish skills — or you're thinking about it — here is the version of this piece that applies to you.
Shipping volume hurts you more than it helps. Every additional skill under your name is one more chance for a reader to install something of yours, have a bad experience, and remember the experience as "that author's stuff doesn't really work." The portfolio framing — "I have published thirty skills" — feels like an asset and isn't. What you actually want is a small number of skills that the readers who use them love, recommend to others, and keep installed across the months when their needs change.
Sharper skills earn that loyalty. Narrow the scope until you can defend the description in one sentence. Write the "When NOT to use" section before you write anything else; it forces you to define the boundary. Test the skill in three or four real sessions before you publish, not after. If you can't think of a real session it would have helped you with, don't publish it yet.
The SKILL.md file is the entirety of your reader's impression of you. They will not click through to your GitHub profile, watch your demo video, or read your blog post. They will read the first paragraph of the SKILL.md, scan the frontmatter, and decide. Spend more time on that file than feels reasonable. The amount of polish a skill needs to look credible is higher than it used to be, because readers have seen enough catalog noise to be sceptical by default.
If you're new to writing skills and want a step-by-step on the file format itself, writing a SKILL.md file walks through the structure, the YAML conventions, and the formatting patterns that read as competent rather than amateur. The mechanical part of writing a good skill is fast to learn. The judgement part — knowing when to publish, when to keep iterating, when to delete the draft and walk away — takes longer, and is what separates the authors people remember.
One last thing. Most of the authors whose skills I trust have between two and six published skills, all in adjacent domains, all updated within the last few months. The authors with fifty skills are usually published-and-forgotten. Decide which kind of author you want to be early. It's much easier to grow a small, sharp catalog than to shrink an overgrown one.
Found a bug or want a topic covered? Email [email protected] or open an issue via GitHub.
SKILL.md files, not affiliated with, endorsed by, or sponsored by Anthropic.