Two General Claude Code skills, side by side. Pick the right skill for your workflow with a side-by-side look at metadata, sample code, and install commands.
| Name | langchain-eval-harness | llm-router |
|---|---|---|
| Description | Build reproducible evaluation pipelines for LangChain 1.0 chains and LangGraph 1.0 agents — golden datasets, LangSmith evaluate(), ragas RAG metrics, deepeval LLM-as-judge, agent trajectory analysis, and CI gating on… | Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to Sonnet/Opus/o1. Use when deciding… |
| Category | General | General |
| Sub-category | general-misc | ai-tooling |
| Tags | ai:llm type:debug | ai:llm |
| Author | Jeremy Longshore <[email protected]> | curiositech |
| License | MIT | — |
| Install | /add-skill langchain-eval-harness | /add-skill llm-router |
# evals/golden_set/v2026.04.jsonl
{"id": "gs-0001", "input": "Refund policy for SKU ABC-42?", "expected": "30 days with receipt", "contexts": ["policy_v3.md"], "tags": ["refund"], "difficulty": "easy", "dataset_version": "2026.04"}
{"id": "gs-0002", "input": "Return policy for opened software?", "expected": "No, opened software is final sale", "contexts": ["policy_v3.md#returns"], "tags": ["refund"], "difficulty": "medium", "dataset_version": "2026.04"}
flowchart TD
A{Task type?} -->|Classify / validate / format / extract| T1["Tier 1: Haiku, GPT-4o-mini (~$0.001)"]
A -->|Write / implement / review / synthesize| T2["Tier 2: Sonnet, GPT-4o (~$0.01)"]
A -->|Reason / architect / judge / decompose| T3["Tier 3: Opus, o1 (~$0.10)"]
T1 --> Q1{Quality sufficient?}
Q1 -->|Yes| Done1[Use cheap model]
Q1 -->|No| T2
T2 --> Q2{Quality sufficient?}
Q2 -->|Yes| Done2[Use balanced model]
Q2 -->|No| T3
langchain-eval-harness — Build reproducible evaluation pipelines for LangChain 1.0 chains and LangGraph 1.0 agents — golden datasets, LangSmith evaluate(), ragas RAG metrics, deepeval LLM-as-judge, agent trajectory analysis, and CI gating on…
llm-router — Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to Sonnet/Opus/o1. Use when deciding…
Both are free to install. If you're unsure, install both — Claude Code skills are isolated by filename and only collide if their trigger phrases overlap (rare). The richest signal is the SKILL.md body itself — open both skill pages and read the first paragraph of each.
See all Claude Code skill comparisons · Browse all General skills · Top 100
SKILL.md files, not affiliated with, endorsed by, or sponsored by Anthropic.