Claude Code Skills·Claude Skills·The open SKILL.md registry for Claude
ClaudSkillsLearn › How I indexed 69k skills

How I indexed 69,000 Claude Code skills (and what I learned doing it)

By Adam Lankamer · 2026-05-24 · 8 min read

One month ago I started building an open catalog of Claude Code skills. This week it crossed 69,369+ indexed SKILL.md files. This post is the engineering story — what I built, what surprised me, and what's free for anyone to use.

If you've never written a Claude Code skill: it's a Markdown file with YAML frontmatter that gives Anthropic's Claude Code agent specialized behavior. Drop it in ~/.claude/skills/<name>/SKILL.md and Claude can invoke it as a slash command. Think of it like a Vim plugin or a VSCode extension, except the contract is "instructions in English" rather than "code in Lua / TypeScript."

The format is brand-new. The official spec doesn't ship a catalog. The awesome-* lists I could find at the time covered maybe 300 hand-picked entries. Meanwhile, GitHub's code search showed thousands of public repos with SKILL.md files in them. The long tail of the ecosystem was completely invisible. That's the gap I set out to close.

The shape of the problem

Here's what I knew going in:

  1. Discovery was broken. A skill author would push their SKILL.md to GitHub and ... nothing. No directory, no aggregator, no search surface. The only way another developer found it was Twitter, Discord, or stumbling onto the repo.
  2. Quality varied wildly. Some skills were 200-line operator-grade tools with pricing tables, anti-trigger sections, and structured examples. Others were 4-line stubs that read like "TODO: write a skill that does X." Both were indexable, neither was distinguishable from outside.
  3. The format itself was changing fast. The frontmatter spec gained fields monthly — allowed-tools, user-invokable, model, metadata.api_base. Yesterday's "good" SKILL.md could be tomorrow's missing-required-field.
  4. There was no good API surface. If you wanted to build something on top of the skill ecosystem (a tool for evaluating skills, a recommender, an installer), you had to scrape GitHub yourself.

I wanted a catalog that fixed all four. Open data, daily refresh, free API, free dataset. No pay-to-list, no listing fees, no ranking-for-money. The only paid product would be an evaluation layer for end-users (a quality score in the desktop app), never anything skill authors had to opt into. Anti-rent-seeking by construction.

The miner — 24 sources, every night

The catalog is built by a single Python script that runs on a Mac mini in my office at 01:00 local. It crawls 24 public sources looking for SKILL.md files:

SourceWhat it discovers
GitHub code search (filename:SKILL.md)The bulk of the catalog — 101 query variants covering language hints, frontmatter fields, and date-bounded slices to defeat the 1000-result hard cap
GitHub Topics (topic:claude-code-skills) + 31 variantsTopic-tagged repos
GitHub GistsSingle-file skills posted as gists (most catalogs miss these)
Awesome-list READMEs (32 lists)Anything the existing curators picked
GitLab, CodebergSkills outside GitHub
HuggingFaceSkills uploaded as datasets
Reddit, HackerNews, Bluesky, Mastodon, dev.to, YouTube, TelegramMentions in posts/comments — text-blob scan for repo URLs
Wayback Machine CDX APIRenamed / deleted repos still discoverable via archive.org
Stargazer graph miningOnce we find one good skill repo, mine who starred it — they often have skills too
Author repo enumerationWhen we admit one of an author's skills, scan their other repos
Topic co-occurrenceTopics tagged alongside claude-code-skills get crawled for next run
VSCode + Open VSX marketplacesSome extensions ship with SKILL.md companions
Brave Search APIWeb-search-anchored discovery
LLM query expansionClaude generates next-week's search queries based on what's been found

Each source returns candidate repo URLs. The miner fetches the SKILL.md, validates the YAML frontmatter, runs admission scoring (more on this below), categorizes by domain (Engineering / Security / Growth / etc. — 10 categories total), tags across ~100 orthogonal dimensions (language, framework, AI provider, cloud, integration type), and writes a static HTML page at /skills/<slug>/.

The miner is bounded: per-source caps prevent any one source from draining the GitHub API budget; every section runs inside a _safe_section() try-block so a single broken endpoint can't kill the run.

A full run takes about 4 hours. New skills appear on the live catalog the same day they're discovered.

Admission — content signals only, no popularity

This is the part I'm most opinionated about. Ranking can't be bought. The moment a paid signal influences who appears in the catalog (or in what order), the value proposition collapses — nobody pays for "objective evaluation" when it isn't objective.

So the catalog admits skills based on a content score derived from the SKILL.md itself:

The score never weighs stars, forks, install counts, GitHub follower count, or any other popularity signal. A skill written by a developer with 0 GitHub followers and a clear anti-trigger section beats a flashy skill by a 50k-follower influencer that's just frontmatter-and-vibes. That's the bar.

For ranking inside the desktop app's Pro tier — a separate evaluation layer — the formula is the same content-only structural score plus frontmatter-completeness, rescaled to [50, 100]. Still no popularity signals.

This costs me about 30% of what an unconstrained "rank by stars" catalog would surface. I'm OK with that trade.

What surprised me

1. The catalog is dominated by a handful of prolific authors. One contributor has 3,446 admitted skills (yes, really). The top 25 authors account for ~30% of the catalog. There's a Pareto distribution underneath the long tail.

2. Sales-category skills score highest on content quality. Counter-intuitive — I expected Engineering or Security to be most polished. Turns out sales-focused skill authors over-index on structure (anti-trigger sections, scope discipline, pricing transparency) because that's their professional habit. Engineering authors more often skip the "when NOT to use" section because they assume it's obvious.

3. Vendor-side adoption is still 0. The catalog has zero skills with author_url pointing at anthropic.com, openai.com, or any other large AI vendor. Every entry is independent. The ecosystem is fully community-driven.

4. The SKILL.md format is leaking sideways. I found skills in repos tagged cline-skills, cursor-rules, aider-skills, windsurf-rules. The format is becoming a portable agent-skill standard, not just a Claude Code thing. The catalog admits these too — they're SKILL.md files, the agent that loads them is the user's choice.

5. The biggest discovery surface isn't GitHub code search. It's the stargazer graph. When a SKILL.md hits a few hundred stars, the people who star it have a 30%+ rate of having their own SKILL.md somewhere in their account. Mining the graph yields skills the code-search queries don't find.

What's free

Everything the catalog produces is open:

What I'd change if starting over

A few things I learned the hard way:

  1. Build the public dataset first, the website second. I focused on the consumer-facing site early — should have shipped the open dataset first. Researchers and tool-builders pick up CC BY 4.0 data within days of finding it; consumer-facing UIs take weeks to build word-of-mouth.
  2. Cloudflare Workers + R2 + Netlify together is more reliable than any one of them. The site has 64,000+ per-skill HTML pages, which would blow Netlify's deploy-prep budget at scale. So per-skill HTML files live in Cloudflare R2 with a Netlify rewrite to serve them from claudskills.com/skills/<slug>/. API + embed + badge endpoints are Cloudflare Workers bound to the same domain. The homepage + static pages are direct from Netlify. Each layer doing what it's best at.
  3. Anti-popularity signals were the hardest decision and the most important one. Every time I evaluate a candidate change to the ranking algorithm, "would skill authors pay to influence this?" is the test. If yes, the change doesn't ship. The discipline pays off when you have a Pro subscription product — it's "pay $9/month for the multi-signal Quality Score in the desktop app," and there's nothing for me to defend about why the score is honest. It's honest by construction.

What's next

The next quarter is about distribution — the catalog exists, now developers need to find it. The roadmap:

If you've written a SKILL.md, it's probably already in the catalog — search for your repo name at claudskills.com. If you haven't, the catalog will pick it up within 24 hours of you pushing to a public GitHub repo. If you want to fast-track it, there's a submit form on the homepage.

If you're a researcher, a tool-builder, or an LLM-pipeline operator who wants to ingest the data: the public dataset refreshes daily, and the API is rate-limit-free for normal use. Build something cool — I'd love to hear about it.

The catalog is at claudskills.com. The dataset is at github.com/claudskills/catalog-public. Comments + questions to [email protected].

ClaudSkills is an independent community catalog. Claude™ is a trademark of Anthropic PBC; ClaudSkills is not affiliated with, endorsed by, or sponsored by Anthropic.