Is the quality score the same for free and Pro users?

No. Free pages display an unrated catalog — no quality score is shown on free skill pages. Pro users see a numeric Quality Score on every skill in the desktop app. Public skills.json carries a `daily_eligible: true` flag for skills scoring ≥80, which is used by the mobile companion app and the Skill of the Day picker. The raw numeric score is server-side only.

Does the rubric look at popularity signals?

No. Popularity signals (GitHub stars, install counts, share counts) were permanently dropped from the formula on 2026-05-07. The Pro Quality Score is now 100% content-derived: 80% structural (this rubric) + 20% metadata depth. The reasoning is documented at length in the engine spec — popularity is downstream of promotion, not of authoring craft. Including it would penalize new authors on identical work.

How often does the rubric re-evaluate my skill?

Every miner cycle (nightly at 01:00 local, typically finishes by ~05:00). If you edit the upstream SKILL.md, the miner re-fetches and re-scores at the next cycle. New edits typically reflect in the rendered skill page within ~24 hours.

Are there secret bonus signals the scorer rewards?

No. The full source is at scripts/quality_score.py in the public site repo. Every constant and threshold is visible. Six axes; no hidden multipliers; no popularity bumps; no editorial overrides.

What if my skill ALREADY scores well but isn't in daily_eligible?

The daily_eligible threshold is quality_pro ≥ 80, applied after the structural score is computed and blended with metadata_depth. If you're at 75-79 you're close — the highest-leverage tightenings are: add an explicit 'When NOT to use' section (worth up to 16 points alone), and add 2-3 frontmatter fields you don't have (model, version, allowed-tools).

ClaudSkills › Learn › Skill quality rubric

The Claude Code skill quality rubric — what the scorer actually checks

Published 23 May 2026 · 9 min read · The source is at scripts/quality_score.py in the public site repo — no hidden constants

Every SKILL.md the ClaudSkills miner crawls runs through a six-axis structural rubric that produces a 0-100 score. Skills scoring ≥80 land in the daily_eligible bucket — the top ~12% of the catalog, source pool for the mobile app's daily push, the Skill of the Day picker, and the social drafter. This post walks through the actual axes, with point ranges and one practical authoring tightening per axis.

This rubric is content-derived, not popularity-derived. The Pro Quality Score blends 80% of this structural heuristic with 20% metadata depth. Popularity signals (GitHub stars, install counts, share counts) were permanently dropped from the formula on 2026-05-07 — the reasoning is that popularity is downstream of promotion, not of authoring craft, and including it would penalise new authors on identical work.

The six axes at a glance

Axis	Max points	Reads as
Description depth	20	How much context can Claude derive from the description alone
Vocabulary diversity	5	Filler-language penalty proxy
Name + body + interface	~31	Substance vs stub
Anti-trigger discipline	16	Does the skill name when it should NOT be invoked
Pricing / quota disclosure	10	API-driven skills surface their cost
Custom frontmatter depth	15	Goes beyond name+description
Sum	97	Plus the -5 filler-phrase penalty (see Axis 7)

A baseline skill that does the obvious things — 200-char description, 500-char body, a tools field, an arg-hint — lands around 60. Skills demonstrating scope discipline + pricing transparency + rich frontmatter land 80-95. Stub skills (just name + a vague one-liner) land 15-35. The rubric clusters in the middle deliberately; admission threshold is 50, so everything below that doesn't enter the catalog at all.

Axis 1 — Description depth (0-20)

Description depth 0 → 20 points

Description length (chars)	Points
≥ 800	20
≥ 400	17
≥ 200	13
≥ 80	8
≥ 40	4
< 40	0

Tightening: aim for ≥200 chars. The jump from 80-char to 200-char descriptions is 5 points; from 200 to 400 is another 4. Claude reads the description to decide whether to invoke the skill — a 40-char description gives the model almost no signal.

Axis 2 — Vocabulary diversity (0-5)

Vocabulary diversity in description 0 → 5 points

Distinct meaningful words (length > 2)	Points
≥ 8	5
≥ 5	2
< 5	0

Tightening: avoid stop-word stuffing. A description like "this skill helps with code" has 5 words but ~2 distinct meaningful ones. Compare "Audits Terraform plans for security misconfigurations and IAM privilege escalations" — same length, 10 distinct meaningful words. The rubric is a proxy for "does the description actually describe the skill."

Axis 3 — Name + body + interface (0-31)

Substance axis 0 → ~31 points

This axis stacks several sub-checks additively. None is individually large, but they compound.

Sub-check	Points
Skill has a name distinct from description, ≥4 chars	+5
Body ≥ 2,000 chars	+15
Body ≥ 500 chars (and < 2,000)	+8
Body ≥ 100 chars (and < 500)	+3
Has `allowed-tools` declared	+5
Has `argument-hint` declared	+3
Has any examples / inputs / outputs section	+3

Tightening: the 500 → 2,000-char body jump is +7 points — the single most lucrative edit on this axis. Most stub skills clock around 200-400 chars of body; doubling to 800-1,000 chars (one section of real usage examples) usually crosses the threshold.

Axis 4 — Anti-trigger discipline (0-16)

Anti-trigger discipline 0 → 16 points

This axis is the single highest-leverage edit you can make to a SKILL.md. The scorer counts how many distinct anti-trigger patterns appear anywhere in the body, capped at 4 patterns = 16 points (4 each).

Detected patterns (case-insensitive substring match):

when not to use
when not to invoke
do not use
do not invoke
skip this skill
out of scope
not for
avoid using
anti-pattern / anti pattern
this skill is not
this is not the right skill

Tightening: add a 50-100 word "When NOT to use this skill" section. The named-section style is the easiest path to 4 anti-trigger pattern matches:

## When NOT to use this skill

Skip this skill if:
- You need a one-off ad-hoc analysis — that's out of scope for this skill.
- You're working with a binary file (PDF, image) — this skill is not for those formats.
- You want speculative refactoring suggestions — use the related code-review skill instead.

That single section typically lands 4 distinct anti-trigger matches = max 16 points.

Axis 5 — Pricing / quota disclosure (0-10)

Pricing / quota disclosure 0 → 10 points

Two detectors stack additively (capped at 10):

Dollar-bearing tables — markdown tables with $ followed by digits. Matches a pricing table.
Quota keywords — rate limit, requests per second/minute/hour/day/month, tokens per …, quota, tier 1 / tier 2, free tier, paid tier.

Tightening: only applies to skills that wrap an external paid API (OpenAI, Anthropic, Stripe, AWS, etc.). For these, add a 4-line pricing table:

| Provider | Free tier | Paid tier | Rate limit |
|---|---|---|---|
| Foo | 100/day | $5/1k requests | 60 rpm |

Worth 10 points on a skill that would otherwise score 0 on this axis.

Axis 6 — Custom frontmatter depth (0-15)

Custom frontmatter depth 0 → 15 points

The rubric counts distinct frontmatter keys beyond the standard name + description pair. Each extra key adds 1.5 points, capped at 10 extras = 15 points.

Worth-including extras (any of these count):

---
name: my-skill
description: Two-sentence summary.
model: claude-3-5-sonnet
tags: [type:audit, lang:python, cloud:aws]
version: 1.2.0
license: MIT
allowed-tools: ["read", "edit", "bash"]
user-invokable: true
category: security
subcategory: cloud
argument-hint: ""
metadata:
  api_base: https://api.example.com
  rate_limit_rpm: 60
author: jane-doe
repository: https://github.com/jane/my-skill
---

This block has 13 extra keys (caps at 10 = max 15 points). Most SKILL.md files in the catalog have 0-2 extra keys; getting to 5-7 is straightforward, and lands ~8-10 points.

Tightening: the highest-EV adds are tags, allowed-tools, and license — they're useful to downstream consumers anyway, and the scorer treats them equally with the more esoteric fields.

The filler-phrase penalty (−5)

If the body literally contains the phrase "this skill" or "a skill that" or "use this skill" or "this is a skill" anywhere, the rubric subtracts 5 points. The penalty is intentionally crude but catches a common authoring tic: starting every paragraph with self-referential framing instead of content. The fix is to delete the phrase — usually the sentence reads better without it.

How the structural score becomes the Pro Quality Score. The structural score (0-100) blends with the metadata-depth signal (0-100) at a 80/20 weighting to produce the Pro Quality Score also in 0-100. Skills scoring ≥80 on the blended score get daily_eligible: true on the public skills.json. That's the bucket the mobile app's daily push, the Skill of the Day picker, and the social drafter all read from. Today, 7,285 of 69,016 skills clear that threshold (10.6%).

End-to-end worked example

Take a baseline skill with no discipline applied — 80-char description, 300-char body, name + description only in frontmatter, no anti-trigger section, no pricing table, includes the phrase "This skill helps you…":

Axis	Score	Notes
Description depth	8	80 chars → +8
Vocabulary diversity	2	~6 distinct words
Name + body + interface	8	name diff (5) + body 300 (3)
Anti-trigger	0	—
Pricing/quota	0	—
Frontmatter	0	—
Filler penalty	−5	"this skill" present
Sum	13	Below admission threshold (50)

This skill would NOT be admitted to the catalog. Now apply the four cheapest tightenings — bump description to 200 chars, body to 800 chars, add a 4-anti-trigger section, add 5 frontmatter keys (tags, license, allowed-tools, model, version):

Axis	Score	Notes
Description depth	13	200 chars → +13
Vocabulary diversity	5	≥8 distinct words
Name + body + interface	18	name (5) + body 800 (8) + tools (5)
Anti-trigger	16	4 distinct patterns matched
Pricing/quota	0	(N/A for this skill)
Frontmatter	8	5 extras × 1.5
Filler penalty	0	removed
Sum	60	Admitted, mid-tier

~30 minutes of editing took the skill from below admission to mid-tier admitted. Two more tightenings (push body to 2,000 chars, add 5 more frontmatter keys) would land it at ~78, the cusp of daily_eligible.

Things the rubric does NOT check

Code quality. The scorer doesn't parse code in the body. Examples are evaluated only for length contribution.
Author identity. No signal whatsoever on who wrote the skill. Anthropic's reference skills score the same way as a first-time contributor's.
Popularity. Zero stars, forks, install counts, or share metrics. Permanent design decision documented in the engine spec.
Topicality / category fit. A perfect Sales skill scores the same as a perfect Engineering skill. Category is set separately by the miner's classification heuristic.
Recency. A skill written 18 months ago that still applies cleanly today scores the same as one written yesterday.
Language. SKILL.md in English, Japanese, Portuguese, or Russian score by the same rubric. Anti-trigger patterns are English-only currently — a known gap. Non-English skills route through the description-depth + body-length + frontmatter axes only.

If your skill scores 75-79 today

You're close. The two highest-leverage edits to cross 80:

Add a "When NOT to use this skill" section. Worth up to 16 points alone. If you have a vague 2-line "consider scope" comment, formalising it into a named section with 4 distinct anti-trigger patterns is the single biggest jump available.
Add 2-3 frontmatter fields you don't have. model, version, allowed-tools, license, tags, argument-hint — pick three you can fill in honestly. Each is 1.5 points.

Those two together typically move a 75-79 skill to 85-92. The score recomputes at the next miner cycle.

Methodology

Source: scripts/quality_score.py in the ClaudSkills site repo. Every constant in this article is reproducible from that file. The rubric was tuned against a hand-rated reference set of ~50 skills during the Phase 3.4 sprint (2026-05-07). Spearman correlation against human ground truth: ρ = +0.778 — strong positive. The rubric is intentionally conservative: false positives (a low-quality skill scoring high) are rarer than false negatives (a high-quality skill scoring lower than it should). Add the patterns the rubric wants, and the rubric responds.

FAQ

Is the quality score the same for free and Pro users?: No. Free pages display an unrated catalog — no quality score is shown on free skill pages. Pro users see a numeric Quality Score on every skill in the desktop app. Public skills.json carries a daily_eligible: true flag for skills scoring ≥80; the raw numeric score is server-side only.
Does the rubric look at popularity signals?: No. Popularity signals were permanently dropped on 2026-05-07. Pro Quality Score is 100% content-derived: 80% this structural rubric + 20% metadata depth.
How often does the rubric re-evaluate my skill?: Every miner cycle (nightly at 01:00 local, typically finishes by ~05:00). Edits to the upstream SKILL.md reflect within ~24 hours.
Are there secret bonus signals the scorer rewards?: No. The full source is at scripts/quality_score.py in the public site repo. Every constant and threshold is visible.
What if my skill ALREADY scores well but isn't in daily_eligible?: The threshold is quality_pro ≥ 80, applied after structural + metadata are blended. If you're at 75-79, the highest-leverage tightenings are: add an explicit "When NOT to use" section (worth up to 16 points alone), and add 2-3 frontmatter fields you don't have.

The Claude Code skill quality rubric — what the scorer actually checks

The six axes at a glance

Axis 1 — Description depth (0-20)

Description depth 0 → 20 points

Axis 2 — Vocabulary diversity (0-5)

Vocabulary diversity in description 0 → 5 points

Axis 3 — Name + body + interface (0-31)

Substance axis 0 → ~31 points

Axis 4 — Anti-trigger discipline (0-16)

Anti-trigger discipline 0 → 16 points

Axis 5 — Pricing / quota disclosure (0-10)

Pricing / quota disclosure 0 → 10 points

Axis 6 — Custom frontmatter depth (0-15)

Custom frontmatter depth 0 → 15 points

The filler-phrase penalty (−5)

End-to-end worked example

Things the rubric does NOT check

If your skill scores 75-79 today

Methodology

FAQ

Related reading

Categories

Use cases

Popular tags

Learn

Site