Claude Code Skills·Claude Skills·The open SKILL.md registry for Claude
HomeCompare › langchain-eval-harness vs llm-router

langchain-eval-harness vs llm-router

Two General Claude Code skills, side by side. Pick the right skill for your workflow with a side-by-side look at metadata, sample code, and install commands.

Side-by-side

Namelangchain-eval-harnessllm-router
DescriptionBuild reproducible evaluation pipelines for LangChain 1.0 chains and LangGraph 1.0 agents — golden datasets, LangSmith evaluate(), ragas RAG metrics, deepeval LLM-as-judge, agent trajectory analysis, and CI gating on…Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to Sonnet/Opus/o1. Use when deciding…
CategoryGeneralGeneral
Sub-categorygeneral-miscai-tooling
Tagsai:llm type:debugai:llm
AuthorJeremy Longshore <[email protected]>curiositech
LicenseMIT
Install/add-skill langchain-eval-harness/add-skill llm-router

Tag overlap

Shared ai:llm

Only in langchain-eval-harness type:debug

Only in llm-router

Sample code from each SKILL.md

langchain-eval-harness

# evals/golden_set/v2026.04.jsonl
{"id": "gs-0001", "input": "Refund policy for SKU ABC-42?", "expected": "30 days with receipt", "contexts": ["policy_v3.md"], "tags": ["refund"], "difficulty": "easy", "dataset_version": "2026.04"}
{"id": "gs-0002", "input": "Return policy for opened software?", "expected": "No, opened software is final sale", "contexts": ["policy_v3.md#returns"], "tags": ["refund"], "difficulty": "medium", "dataset_version": "2026.04"}

llm-router

flowchart TD
  A{Task type?} -->|Classify / validate / format / extract| T1["Tier 1: Haiku, GPT-4o-mini (~$0.001)"]
  A -->|Write / implement / review / synthesize| T2["Tier 2: Sonnet, GPT-4o (~$0.01)"]
  A -->|Reason / architect / judge / decompose| T3["Tier 3: Opus, o1 (~$0.10)"]
  
  T1 --> Q1{Quality sufficient?}
  Q1 -->|Yes| Done1[Use cheap model]
  Q1 -->|No| T2
  
  T2 --> Q2{Quality sufficient?}
  Q2 -->|Yes| Done2[Use balanced model]
  Q2 -->|No| T3

When to choose each

langchain-eval-harness — Build reproducible evaluation pipelines for LangChain 1.0 chains and LangGraph 1.0 agents — golden datasets, LangSmith evaluate(), ragas RAG metrics, deepeval LLM-as-judge, agent trajectory analysis, and CI gating on…

llm-router — Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to Sonnet/Opus/o1. Use when deciding…

Both are free to install. If you're unsure, install both — Claude Code skills are isolated by filename and only collide if their trigger phrases overlap (rare). The richest signal is the SKILL.md body itself — open both skill pages and read the first paragraph of each.

Open langchain-eval-harness → Open llm-router →

Other comparisons in this category

See all Claude Code skill comparisons · Browse all General skills · Top 100