ClaudSkills / Science & Research / research-methods

evaluating-llms-harness

Quality score: 70/100  ·  Category: Science & Research  ·  Sub-category: research-methods
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

What this skill does

evaluating-llms-harness is a production-ready Claude Code skill (quality score 70/100) in the research-methods sub-category. It ships as a SKILL.md file that Claude Code auto-discovers under ~/.claude/skills/lm-evaluation-harness/ and loads when your prompt matches the skill's trigger.

When to invoke it: Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs.

Who uses this skill

The evaluating-llms-harness skill is built for researchers, data scientists, academics, and analysts working with complex data and scientific literature. It is part of the open ClaudSkills registry, a community-curated catalog of 15,000+ capabilities you can install for Claude Code — the Claude CLI agent.

How to install

Free

Manual install (2 steps)

mkdir -p ~/.claude/skills/lm-evaluation-harness
curl -L https://claudskills.com/skills/lm-evaluation-harness/SKILL.md \
  -o ~/.claude/skills/lm-evaluation-harness/SKILL.md

Or just download SKILL.md directly and drop it into ~/.claude/skills/lm-evaluation-harness/. Claude Code auto-discovers it on next session.

Skills live at ~/.claude/skills/lm-evaluation-harness/SKILL.md on macOS/Linux, or %USERPROFILE%\.claude\skills\lm-evaluation-harness\SKILL.md on Windows. See the full install guide for step-by-step instructions.

Pro

One-click install via the desktop app

The ClaudSkills desktop app installs any skill directly into ~/.claude/skills/ with one click — no terminal required. Pro starts at $9/mo or $149 lifetime.

More Science & Research skills

Browse all Science & Research skills in the ClaudSkills registry, or explore these top-rated picks from the same category:

Browse all Science & Research skills → Top 100 skills
Part of ClaudSkills — the open registry for Claude Code skills.  ·  What's New  ·  Install guide  ·  About  ·  llms.txt

Part of Acreator Store — Adam Lankamer's AI tools: GifPerfect · AspectPerfect · SlomoPerfect · Ucaption · UTagger · AutoXPoster · TestYourSkills