Claude Code Skills·Claude Skills·The open SKILL.md registry for Claude
ClaudSkillsLearn › Skill quality rubric

The Claude Code skill quality rubric — what the scorer actually checks

Published 23 May 2026 · 9 min read · The source is at scripts/quality_score.py in the public site repo — no hidden constants

Every SKILL.md the ClaudSkills miner crawls runs through a six-axis structural rubric that produces a 0-100 score. Skills scoring ≥80 land in the daily_eligible bucket — the top ~12% of the catalog, source pool for the mobile app's daily push, the Skill of the Day picker, and the social drafter. This post walks through the actual axes, with point ranges and one practical authoring tightening per axis.

This rubric is content-derived, not popularity-derived. The Pro Quality Score blends 80% of this structural heuristic with 20% metadata depth. Popularity signals (GitHub stars, install counts, share counts) were permanently dropped from the formula on 2026-05-07 — the reasoning is that popularity is downstream of promotion, not of authoring craft, and including it would penalise new authors on identical work.

The six axes at a glance

AxisMax pointsReads as
Description depth20How much context can Claude derive from the description alone
Vocabulary diversity5Filler-language penalty proxy
Name + body + interface~31Substance vs stub
Anti-trigger discipline16Does the skill name when it should NOT be invoked
Pricing / quota disclosure10API-driven skills surface their cost
Custom frontmatter depth15Goes beyond name+description
Sum97Plus the -5 filler-phrase penalty (see Axis 7)

A baseline skill that does the obvious things — 200-char description, 500-char body, a tools field, an arg-hint — lands around 60. Skills demonstrating scope discipline + pricing transparency + rich frontmatter land 80-95. Stub skills (just name + a vague one-liner) land 15-35. The rubric clusters in the middle deliberately; admission threshold is 50, so everything below that doesn't enter the catalog at all.

Axis 1 — Description depth (0-20)

Description depth 0 → 20 points

Description length (chars)Points
≥ 80020
≥ 40017
≥ 20013
≥ 808
≥ 404
< 400

Tightening: aim for ≥200 chars. The jump from 80-char to 200-char descriptions is 5 points; from 200 to 400 is another 4. Claude reads the description to decide whether to invoke the skill — a 40-char description gives the model almost no signal.

Axis 2 — Vocabulary diversity (0-5)

Vocabulary diversity in description 0 → 5 points

Distinct meaningful words (length > 2)Points
≥ 85
≥ 52
< 50

Tightening: avoid stop-word stuffing. A description like "this skill helps with code" has 5 words but ~2 distinct meaningful ones. Compare "Audits Terraform plans for security misconfigurations and IAM privilege escalations" — same length, 10 distinct meaningful words. The rubric is a proxy for "does the description actually describe the skill."

Axis 3 — Name + body + interface (0-31)

Substance axis 0 → ~31 points

This axis stacks several sub-checks additively. None is individually large, but they compound.

Sub-checkPoints
Skill has a name distinct from description, ≥4 chars+5
Body ≥ 2,000 chars+15
Body ≥ 500 chars (and < 2,000)+8
Body ≥ 100 chars (and < 500)+3
Has allowed-tools declared+5
Has argument-hint declared+3
Has any examples / inputs / outputs section+3

Tightening: the 500 → 2,000-char body jump is +7 points — the single most lucrative edit on this axis. Most stub skills clock around 200-400 chars of body; doubling to 800-1,000 chars (one section of real usage examples) usually crosses the threshold.

Axis 4 — Anti-trigger discipline (0-16)

Anti-trigger discipline 0 → 16 points

This axis is the single highest-leverage edit you can make to a SKILL.md. The scorer counts how many distinct anti-trigger patterns appear anywhere in the body, capped at 4 patterns = 16 points (4 each).

Detected patterns (case-insensitive substring match):

Tightening: add a 50-100 word "When NOT to use this skill" section. The named-section style is the easiest path to 4 anti-trigger pattern matches:

## When NOT to use this skill

Skip this skill if:
- You need a one-off ad-hoc analysis — that's out of scope for this skill.
- You're working with a binary file (PDF, image) — this skill is not for those formats.
- You want speculative refactoring suggestions — use the related code-review skill instead.

That single section typically lands 4 distinct anti-trigger matches = max 16 points.

Axis 5 — Pricing / quota disclosure (0-10)

Pricing / quota disclosure 0 → 10 points

Two detectors stack additively (capped at 10):

Tightening: only applies to skills that wrap an external paid API (OpenAI, Anthropic, Stripe, AWS, etc.). For these, add a 4-line pricing table:

| Provider | Free tier | Paid tier | Rate limit |
|---|---|---|---|
| Foo | 100/day | $5/1k requests | 60 rpm |

Worth 10 points on a skill that would otherwise score 0 on this axis.

Axis 6 — Custom frontmatter depth (0-15)

Custom frontmatter depth 0 → 15 points

The rubric counts distinct frontmatter keys beyond the standard name + description pair. Each extra key adds 1.5 points, capped at 10 extras = 15 points.

Worth-including extras (any of these count):

---
name: my-skill
description: Two-sentence summary.
model: claude-3-5-sonnet
tags: [type:audit, lang:python, cloud:aws]
version: 1.2.0
license: MIT
allowed-tools: ["read", "edit", "bash"]
user-invokable: true
category: security
subcategory: cloud
argument-hint: ""
metadata:
  api_base: https://api.example.com
  rate_limit_rpm: 60
author: jane-doe
repository: https://github.com/jane/my-skill
---

This block has 13 extra keys (caps at 10 = max 15 points). Most SKILL.md files in the catalog have 0-2 extra keys; getting to 5-7 is straightforward, and lands ~8-10 points.

Tightening: the highest-EV adds are tags, allowed-tools, and license — they're useful to downstream consumers anyway, and the scorer treats them equally with the more esoteric fields.

The filler-phrase penalty (−5)

If the body literally contains the phrase "this skill" or "a skill that" or "use this skill" or "this is a skill" anywhere, the rubric subtracts 5 points. The penalty is intentionally crude but catches a common authoring tic: starting every paragraph with self-referential framing instead of content. The fix is to delete the phrase — usually the sentence reads better without it.

How the structural score becomes the Pro Quality Score. The structural score (0-100) blends with the metadata-depth signal (0-100) at a 80/20 weighting to produce the Pro Quality Score also in 0-100. Skills scoring ≥80 on the blended score get daily_eligible: true on the public skills.json. That's the bucket the mobile app's daily push, the Skill of the Day picker, and the social drafter all read from. Today, 7,285 of 69,016 skills clear that threshold (10.6%).

End-to-end worked example

Take a baseline skill with no discipline applied — 80-char description, 300-char body, name + description only in frontmatter, no anti-trigger section, no pricing table, includes the phrase "This skill helps you…":

AxisScoreNotes
Description depth880 chars → +8
Vocabulary diversity2~6 distinct words
Name + body + interface8name diff (5) + body 300 (3)
Anti-trigger0
Pricing/quota0
Frontmatter0
Filler penalty−5"this skill" present
Sum13Below admission threshold (50)

This skill would NOT be admitted to the catalog. Now apply the four cheapest tightenings — bump description to 200 chars, body to 800 chars, add a 4-anti-trigger section, add 5 frontmatter keys (tags, license, allowed-tools, model, version):

AxisScoreNotes
Description depth13200 chars → +13
Vocabulary diversity5≥8 distinct words
Name + body + interface18name (5) + body 800 (8) + tools (5)
Anti-trigger164 distinct patterns matched
Pricing/quota0(N/A for this skill)
Frontmatter85 extras × 1.5
Filler penalty0removed
Sum60Admitted, mid-tier

~30 minutes of editing took the skill from below admission to mid-tier admitted. Two more tightenings (push body to 2,000 chars, add 5 more frontmatter keys) would land it at ~78, the cusp of daily_eligible.

Things the rubric does NOT check

If your skill scores 75-79 today

You're close. The two highest-leverage edits to cross 80:

  1. Add a "When NOT to use this skill" section. Worth up to 16 points alone. If you have a vague 2-line "consider scope" comment, formalising it into a named section with 4 distinct anti-trigger patterns is the single biggest jump available.
  2. Add 2-3 frontmatter fields you don't have. model, version, allowed-tools, license, tags, argument-hint — pick three you can fill in honestly. Each is 1.5 points.

Those two together typically move a 75-79 skill to 85-92. The score recomputes at the next miner cycle.

Methodology

Source: scripts/quality_score.py in the ClaudSkills site repo. Every constant in this article is reproducible from that file. The rubric was tuned against a hand-rated reference set of ~50 skills during the Phase 3.4 sprint (2026-05-07). Spearman correlation against human ground truth: ρ = +0.778 — strong positive. The rubric is intentionally conservative: false positives (a low-quality skill scoring high) are rarer than false negatives (a high-quality skill scoring lower than it should). Add the patterns the rubric wants, and the rubric responds.

Share on X Share on LinkedIn Submit to HN

FAQ

Is the quality score the same for free and Pro users?
No. Free pages display an unrated catalog — no quality score is shown on free skill pages. Pro users see a numeric Quality Score on every skill in the desktop app. Public skills.json carries a daily_eligible: true flag for skills scoring ≥80; the raw numeric score is server-side only.
Does the rubric look at popularity signals?
No. Popularity signals were permanently dropped on 2026-05-07. Pro Quality Score is 100% content-derived: 80% this structural rubric + 20% metadata depth.
How often does the rubric re-evaluate my skill?
Every miner cycle (nightly at 01:00 local, typically finishes by ~05:00). Edits to the upstream SKILL.md reflect within ~24 hours.
Are there secret bonus signals the scorer rewards?
No. The full source is at scripts/quality_score.py in the public site repo. Every constant and threshold is visible.
What if my skill ALREADY scores well but isn't in daily_eligible?
The threshold is quality_pro ≥ 80, applied after structural + metadata are blended. If you're at 75-79, the highest-leverage tightenings are: add an explicit "When NOT to use" section (worth up to 16 points alone), and add 2-3 frontmatter fields you don't have.

Related reading