ClaudSkills / Science & Research / math-stats

Eval Driven Development

Q: What is a Claude Code skill and how does the Eval Driven Development skill fit in?

A Claude Code skill is a SKILL.md file that lives under ~/.claude/skills/<name>/ and tells the Claude Code CLI agent how to perform a specific task (instructions, prompts, allowed tools). Skills are auto-discovered at session start. Eval Driven Development is one of 67,000+ skills indexed in the open ClaudSkills catalog, classified under the Science & Research category. Learn more at https://claudskills.com/learn/what-is-skill-md/.

Category: Science & Research · Sub-category: math-stats · Last updated: 2026-05-21

Use when reasoning about building language-model-integrated systems by writing evaluations before and alongside the system: the statistical (not binary) nature of LLM evals, the five primitives (dataset, evaluation function, aggregation, iteration loop, regression budget), the judgment-mechanism taxonomy (programmatic, model-graded, human-graded, preference comparison), the difference between system-specific evals and canonical benchmarks (MMLU, HumanEval, BIG-bench, GAIA), how evals drive prompt/model/scaffolding/tooling changes, why Goodhart's Law means higher eval scores are not always improvements, and the offline-eval-vs-production-telemetry distinction. Do NOT use for deterministic unit testing (use testing-strategy), production monitoring (use evaluation or error-tracking), general-software TDD (use testing-strategy), or the construction of individual eval rubrics and task sets (use agent-eval-design — it owns construction; this skill owns the iteration discipline).

Security AStatic scan found no risk patternsHow grading works ›

About this skill (catalog notes)

Eval Driven Development includes explicit scope boundaries (an explicit 'when not to use' or 'out of scope' section); pricing or quota commentary; at least one code block. At roughly 1,861 words the SKILL.md is on the longer end of the catalog distribution.

Source: www.npmjs.com/package/@skill-graph/cli
Original author: jacob-balslev
Indexed lastmod: 2026-05-21
Catalog position: Science & Research · math-stats
Indexed related skills: 10

How Eval Driven Development fits the catalog

Eval Driven Development sits in the Science & Research category under the math-stats sub-topic in the ClaudSkills catalog. There are 10 related skills indexed alongside it; comparing a few before installing usually reveals which fits your workflow best.

These notes are auto-generated from features detected in the SKILL.md file and from this catalog's structure — they aren't part of the source repository.

What this skill does

Eval Driven Development is a community-contributed Claude Code skill in the math-stats sub-category. It ships as a SKILL.md file that Claude Code auto-discovers under ~/.claude/skills/eval-driven-development/ and loads when your prompt matches the skill's trigger.

When to invoke it: Use when reasoning about building language-model-integrated systems by writing evaluations before and alongside the system: the statistical (not binary) nature of LLM evals, the five primitives (dataset, evaluation function, aggregation, iteration loop, regression budget), the judgment-mechanism taxonomy (programmatic, model-graded, human-graded, preference comparison), the difference between system-specific evals and canonical benchmarks (MMLU, HumanEval, BIG-bench, GAIA), how evals drive prompt/model/scaffolding/tooling changes, why Goodhart's Law means higher eval scores are not always improvements, and the offline-eval-vs-production-telemetry distinction. Do NOT use for deterministic unit testing (use testing-strategy), production monitoring (use evaluation or error-tracking), general-software TDD (use testing-strategy), or the construction of individual eval rubrics and task sets (use agent-eval-design — it owns construction; this skill owns the iteration discipline).

Who uses this skill

The Eval Driven Development Claude Code skill is built for researchers, data scientists, academics, and analysts working with complex data and scientific literature. It's part of ClaudSkills (also referred to as Claude Skills or Claude Code Skills) — the open community-curated registry of 146,000+ SKILL.md files for Anthropic's Claude Code agent and the wider Claude ecosystem (Claude API, Claude Agent SDK).

How to install

Free

Manual install (2 steps)

mkdir -p ~/.claude/skills/eval-driven-development
curl -L https://claudskills.com/skills/eval-driven-development/SKILL.md \
  -o ~/.claude/skills/eval-driven-development/SKILL.md

Or just download SKILL.md directly and drop it into ~/.claude/skills/eval-driven-development/. Claude Code auto-discovers it on next session.

Skills live at ~/.claude/skills/eval-driven-development/SKILL.md on macOS/Linux, or %USERPROFILE%\.claude\skills\eval-driven-development\SKILL.md on Windows. See the full install guide for step-by-step instructions.

📱 Install from your phone or desktop Telegram

Open @claudskills_bot on Telegram, tap Open Desktop App, and the desktop app installs this skill for you. Or share the bot link with a colleague — they get the same one-tap install. Learn more →

Pro

One-click install via the desktop app

The ClaudSkills desktop app installs any skill directly into ~/.claude/skills/ with one click — no terminal required. Pro starts at $9/mo or $149 lifetime.

Pro

For the full experience including quality scoring and one-click install features for each skill — upgrade to Pro.

See pricing → Download desktop app

Frequently asked questions

How do I install the Eval Driven Development Claude Code skill?

Install via the ClaudSkills desktop app (one click) or copy SKILL.md from the source repository to ~/.claude/skills/eval-driven-development/SKILL.md and restart Claude Code. Both flows are detailed at claudskills.com/install/.

What does the Eval Driven Development skill do?

Is this skill free to install?

Yes. ClaudSkills is an open registry — every skill keeps its source repository's license, and manual install via copy is free. ClaudSkills Pro ($9/mo, $79/yr, or $149 one-time) adds one-click install via the desktop app and a multi-signal Quality Score.

When should I use the Eval Driven Development skill?

Use Eval Driven Development when your Claude Code task falls under the Science & Research category — specifically in the math stats area. Claude Code auto-discovers installed skills and invokes the right one based on the task description, so you can also ask Claude directly (e.g. "use Eval Driven Development" or describe the task and let Claude pick). Browse related skills at /category/science/.

What is a Claude Code skill and how does the Eval Driven Development skill fit in?

A Claude Code skill is a SKILL.md file that lives under ~/.claude/skills/<name>/ and tells the Claude Code CLI agent how to perform a specific task (instructions, prompts, allowed tools). Skills are auto-discovered at session start. Eval Driven Development is one of 67,000+ skills indexed in the open ClaudSkills catalog, classified under the Science & Research category. Learn more at /learn/what-is-a-claude-skill/.

Attribution & license

Source: https://github.com/jacob-balslev/skill-graph/blob/HEAD/marketplace/skills/eval-driven-development/SKILL.md
Author: jacob-balslev
Website: https://www.npmjs.com/package/@skill-graph/cli

Cite this skill

If you reference this skill in a blog post, paper, or documentation, you can cite it as:

APA

jacob-balslev. (2026). Eval Driven Development [Claude Code skill]. ClaudSkills. https://claudskills.com/skills/eval-driven-development/

BibTeX

@misc{eval-driven-development-2026,
  author    = {jacob-balslev},
  title     = {Eval Driven Development [Claude Code skill]},
  year      = {2026},
  publisher = {ClaudSkills},
  url       = {https://claudskills.com/skills/eval-driven-development/}
}

Embed this skill

Promote, attribute, or link this skill from your own README, blog post, or documentation. All three snippets are free to use — no sign-up, no API key. More distribution surfaces →

Badge

[![ClaudSkills](https://claudskills.com/badge/eval-driven-development.svg)](https://claudskills.com/skills/eval-driven-development/?utm_source=badge&utm_medium=readme&utm_campaign=skill_badge)

Security scan

Grade A · scanned 2026-07-06 — free static scan against the OWASP Agentic Skills Top 10.

No risk patterns were found in any of the ten OWASP-aligned categories. How grading works ›

✓ Prompt injection
✓ Data exfiltration
✓ Supply chain
✓ Reverse shell
✓ Credentials
✓ Execution
✓ Filesystem
✓ Persistence
✓ Obfuscation
✓ Network

Show this grade on your repo (click to copy):

[![Security: A](https://img.shields.io/badge/Security-A-2e7d32)](https://claudskills.com/skills/eval-driven-development/#security)

More Science & Research skills

Browse all Science & Research skills in the ClaudSkills registry, or explore these other picks from the same category:

Browse all Science & Research skills → Top 100 skills

Part of ClaudSkills — the open registry for Claude Skills & Claude Code Skills. · What's New · Install guide · About · llms.txt

Part of Acreator Store — Adam Lankamer's AI tools: PerfectStudio · Ucaption · UTagger · AutoXPoster · TestYourSkills · AutomationFlows · Au Naturel · Telegram @acreatorstore