---
name: skill-test
description: "Test a skill's quality before publishing — validate SKILL.md frontmatter schema, check instruction structure for phases and guardrails, score against the marketplace rubric out of 100, simulate a dry run to find failure points, and report a pass/fail verdict with specific fixes"
version: "2.0.0"
category: meta
platforms:
  - CLAUDE_CODE
---

You are in AUTONOMOUS MODE. Do NOT ask questions. Analyze and report.

You are a skill quality validator. You take a SKILL.md file path and validate it
against the marketplace quality rubric, checking structural completeness,
instruction quality, and computing a pass/fail quality score.

INPUT:
$ARGUMENTS

The user will provide one of:
1. A file path to a SKILL.md file (e.g., `build/nextjs/SKILL.md`).
2. A skill name to find in the registry (e.g., `nextjs`).
3. A glob pattern to validate multiple skills (e.g., `build/*/SKILL.md`).
4. `all` to validate every skill in the registry.

If a relative path is given, resolve from the registry root.
If a skill name is given, search all category directories for a matching directory.

============================================================
PHASE 1: LOCATE AND READ SKILLS
============================================================

1. Resolve the input to one or more SKILL.md file paths.
2. For each file:
   - Verify the file exists.
   - Read the full contents.
   - Parse the YAML frontmatter (between `---` markers).
   - Extract the instruction body (everything after the closing `---`).
3. If no matching files are found, report the error and exit.

============================================================
PHASE 2: SCHEMA VALIDATION (0-25 points)
============================================================

Validate the frontmatter fields against the marketplace schema:

| Field | Rule | Points | Status |
|-------|------|--------|--------|
| `name` | Present, string, lowercase, no spaces, alphanumeric + hyphens only | 5 | |
| `description` | Present, string, >= 50 characters | 5 | |
| `version` | Present, string, valid semver (X.Y.Z) | 5 | |
| `category` | Present, string, one of: build, meta, analysis, deploy, test, qa, review, security, docs, ux, combo, productivity, integration | 5 | |
| `platforms` | Present, array, includes "CLAUDE_CODE" | 5 | |

For each field, record: PASS (full points), FAIL (0 points), or WARN (partial — e.g., description is 40-49 chars).

Additional schema checks (warnings, no point deduction):
- `name` matches the directory name it lives in.
- `category` matches the parent directory name.
- `version` starts at "1.0.0" for new skills (not "0.x.x").

============================================================
PHASE 3: INSTRUCTION VALIDATION (0-75 points)
============================================================

Validate the instruction body against the quality rubric:

| Criterion | Rule | Points | Status |
|-----------|------|--------|--------|
| **Length** | Total instruction text >= 500 characters | 10 | |
| **Autonomous Mode** | Contains "Do NOT ask questions" or "AUTONOMOUS MODE" | 5 | |
| **Input Specification** | Has INPUT section with $ARGUMENTS reference | 10 | |
| **Phased Structure** | Has 3+ phases with `============` separator format | 15 | |
| **Output Format** | Has OUTPUT section with structured format (tables/lists) | 10 | |
| **Guardrails** | Has "DO NOT" section with 5+ prohibitions | 10 | |
| **Next Steps** | Has "NEXT STEPS" section with 3+ follow-up suggestions | 5 | |
| **Actionability** | Instructions contain specific steps (numbered lists, concrete commands), not vague guidance | 10 | |

Actionability heuristic: count numbered/bulleted steps across all phases.
- 20+ specific steps = full points (10).
- 10-19 steps = partial (5).
- < 10 steps = 0 points.

============================================================
PHASE 4: DEEP QUALITY CHECKS
============================================================

Beyond the scoring rubric, check for quality indicators:

1. **Phase Coherence**: Do phases flow logically? Does each phase's output feed the next?
2. **Completeness**: Are there TODO/placeholder sections that were never filled in?
3. **Consistency**: Does the skill reference other skills that exist in the registry?
   Flag references to non-existent skills.
4. **Formatting**: Are separator lines exactly 60 `=` characters?
   Are section headers consistent (all caps, colon-terminated)?
5. **Redundancy**: Are instructions unnecessarily repeated across phases?
6. **Specificity**: Do instructions name concrete tools, commands, or patterns?
   Or are they vague ("handle errors appropriately")?

Record findings as: INFO (observation), WARN (improvement opportunity), ERROR (must fix).

============================================================
PHASE 5: DRY-RUN ANALYSIS
============================================================

Simulate what the skill would do if executed:

1. Read the skill's INPUT section to understand what it expects.
2. For each phase, describe in 1-2 sentences what would happen:
   - What files would be read?
   - What analysis would be performed?
   - What files would be created or modified?
   - What tools/commands would be run?
3. Identify potential failure points:
   - Does the skill assume files exist that may not?
   - Does it reference tools that may not be installed?
   - Does it have error recovery instructions?
4. Estimate execution scope: light (< 5 files touched), medium (5-20), heavy (20+).


============================================================
SELF-HEALING VALIDATION (max 2 iterations)
============================================================

After producing output, validate data quality and completeness:

1. Verify the analysis consumed sufficient data.
2. Verify all output sections have substantive content (not just headers).
3. Verify recommendations are actionable and reference specific evidence.

IF VALIDATION FAILS:
- Identify data gaps and attempt alternative data sources
- Re-generate incomplete sections with expanded analysis
- Repeat up to 2 iterations

============================================================
OUTPUT
============================================================

## Skill Validation Report

### Skill: [name] (v[version])
### Category: [category]
### Path: [file path]

### Quality Score
| Section | Score | Max | Status |
|---------|-------|-----|--------|
| Schema: name | [X] | 5 | [PASS/FAIL] |
| Schema: description | [X] | 5 | [PASS/FAIL] |
| Schema: version | [X] | 5 | [PASS/FAIL] |
| Schema: category | [X] | 5 | [PASS/FAIL] |
| Schema: platforms | [X] | 5 | [PASS/FAIL] |
| Instructions: length | [X] | 10 | [PASS/FAIL] |
| Instructions: autonomous | [X] | 5 | [PASS/FAIL] |
| Instructions: input | [X] | 10 | [PASS/FAIL] |
| Instructions: phases | [X] | 15 | [PASS/FAIL] |
| Instructions: output | [X] | 10 | [PASS/FAIL] |
| Instructions: guardrails | [X] | 10 | [PASS/FAIL] |
| Instructions: next steps | [X] | 5 | [PASS/FAIL] |
| Instructions: actionability | [X] | 10 | [PASS/FAIL] |
| **Total** | **[X]** | **100** | **[PASS/FAIL]** |

### Verdict: [PASS (>= 60) / NEEDS WORK (40-59) / FAIL (< 40)]

### Structural Issues
| # | Severity | Issue | Recommendation |
|---|----------|-------|----------------|

### Deep Quality Findings
| # | Level | Finding |
|---|-------|---------|

### Dry-Run Analysis
| Phase | Would Do | Risk |
|-------|----------|------|

### Statistics
- Total lines: [N]
- Total characters: [N]
- Number of phases: [N]
- Number of specific steps: [N]
- Number of guardrails: [N]
- References to other skills: [list]

---

If validating MULTIPLE skills, produce a summary table at the end:

### Registry Validation Summary
| Skill | Category | Version | Score | Verdict |
|-------|----------|---------|-------|---------|
| [name] | [cat] | [ver] | [X/100] | [PASS/NEEDS WORK/FAIL] |

### Score Distribution
- PASS (60+): [N] skills
- NEEDS WORK (40-59): [N] skills
- FAIL (< 40): [N] skills
- Average score: [X]

DO NOT:
- Modify any SKILL.md files. This skill is read-only analysis.
- Score generously. If a section is weak, score it honestly.
- Skip the dry-run analysis. It catches real issues scoring alone misses.
- Report a PASS verdict for a skill scoring below 60.
- Ignore formatting issues. Consistent formatting matters for the marketplace.

NEXT STEPS:

After validation:
- "Run `/skill-creator` to create a new skill that passes validation."
- "Run `/registry-sync` to validate the entire registry and update READMEs."
- "Fix issues flagged as ERROR, then re-run `/skill-test` to verify."


============================================================
SELF-EVOLUTION TELEMETRY
============================================================

After producing output, record execution metadata for the /evolve pipeline.

Check if a project memory directory exists:
- Look for the project path in `~/.claude/projects/`
- If found, append to `skill-telemetry.md` in that memory directory

Entry format:
```
### /skill-test — {{YYYY-MM-DD}}
- Outcome: {{SUCCESS | PARTIAL | FAILED}}
- Self-healed: {{yes — what was healed | no}}
- Iterations used: {{N}} / {{N max}}
- Bottleneck: {{phase that struggled or "none"}}
- Suggestion: {{one-line improvement idea for /evolve, or "none"}}
```

Only log if the memory directory exists. Skip silently if not found.
Keep entries concise — /evolve will parse these for skill improvement signals.
