---
name: llm-router
license: Apache-2.0
description: Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to Sonnet/Opus/o1. Use when deciding which model to call, optimizing LLM costs, or building multi-model agent systems. Activate on "which model", "model selection", "route to model", "LLM cost", "model routing", "cheap vs expensive model". NOT for prompt engineering (use prompt-engineer), model fine-tuning, or training custom models.
allowed-tools: Read
argument-hint: '[task-description] [budget: low|medium|high]'
metadata:
  category: AI & Machine Learning
  tags:
    - llm
    - router
    - which-model
    - model-selection
    - route-to-model
  pairs-with:
    - skill: cost-optimizer
      reason: Model routing is the primary mechanism for implementing cost optimization decisions
    - skill: cost-accrual-tracker
      reason: Real-time cost data informs routing decisions to stay within budget constraints
    - skill: prompt-engineer
      reason: Prompt complexity analysis determines which model tier the router should select
category: AI & Machine Learning
tags:
  - llm-routing
  - model-selection
  - cost-optimization
  - latency
  - fallback
---

# LLM Router

Selects the optimal LLM model for each task. The single biggest cost lever in multi-agent systems — intelligent routing saves 45-85% while maintaining 95%+ of top-model quality.

---

## When to Use

✅ **Use for**:
- Deciding which model to call for a specific task
- Assigning models to DAG nodes in agent workflows
- Optimizing LLM API costs across a system
- Building cascading try-cheap-first patterns

❌ **NOT for**:
- Prompt engineering (use `prompt-engineer`)
- Model fine-tuning or training
- Comparing model architectures (academic research)

---

## Routing Decision Tree

```mermaid
flowchart TD
  A{Task type?} -->|Classify / validate / format / extract| T1["Tier 1: Haiku, GPT-4o-mini (~$0.001)"]
  A -->|Write / implement / review / synthesize| T2["Tier 2: Sonnet, GPT-4o (~$0.01)"]
  A -->|Reason / architect / judge / decompose| T3["Tier 3: Opus, o1 (~$0.10)"]
  
  T1 --> Q1{Quality sufficient?}
  Q1 -->|Yes| Done1[Use cheap model]
  Q1 -->|No| T2
  
  T2 --> Q2{Quality sufficient?}
  Q2 -->|Yes| Done2[Use balanced model]
  Q2 -->|No| T3
```

---

## Tier Assignment Table

| Task Type | Tier | Models | Cost/Call | Why This Tier |
|-----------|------|--------|-----------|---------------|
| Classify input type | 1 | Haiku, GPT-4o-mini | ~$0.001 | Deterministic categorization |
| Validate schema/format | 1 | Haiku, GPT-4o-mini | ~$0.001 | Mechanical checking |
| Format output / template | 1 | Haiku, GPT-4o-mini | ~$0.001 | Structured transformation |
| Extract structured data | 1 | Haiku, GPT-4o-mini | ~$0.001 | Pattern matching |
| Summarize text | 1-2 | Haiku → Sonnet | ~$0.001-0.01 | Short summaries: Haiku; nuanced: Sonnet |
| Write content/docs | 2 | Sonnet, GPT-4o | ~$0.01 | Creative quality matters |
| Implement code | 2 | Sonnet, GPT-4o | ~$0.01 | Correctness + style |
| Review code/diffs | 2 | Sonnet, GPT-4o | ~$0.01 | Needs judgment, not just pattern matching |
| Research synthesis | 2 | Sonnet, GPT-4o | ~$0.01 | Multi-source reasoning |
| Decompose ambiguous problem | 3 | Opus, o1 | ~$0.10 | Requires deep understanding |
| Design architecture | 3 | Opus, o1 | ~$0.10 | Complex system reasoning |
| Judge output quality | 3 | Opus, o1 | ~$0.10 | Meta-reasoning about quality |
| Plan multi-step strategy | 3 | Opus, o1 | ~$0.10 | Long-horizon planning |

---

## Three Routing Strategies

### Strategy 1: Static Tier Assignment (Start Here)

Assign model by task type at DAG design time. No runtime logic. Gets 60-70% of possible savings.

```yaml
nodes:
  - id: classify
    model: claude-haiku-4-5     # Tier 1: $0.001
  - id: implement
    model: claude-sonnet-4-5    # Tier 2: $0.01  
  - id: evaluate
    model: claude-opus-4-5      # Tier 3: $0.10
```

### Strategy 2: Cascading (Try Cheap First)

Try the cheap model; if quality is below threshold, escalate. Adds ~1s latency but saves 50-80% on nodes where cheap succeeds.

```
1. Execute with Tier 1 model
2. Quick quality check (also Tier 1 — costs ~$0.001)
3. If quality ≥ threshold → done
4. If quality < threshold → re-execute with Tier 2
```

Best for nodes where you're genuinely unsure which tier is needed.

### Strategy 3: Adaptive (Learn from History)

Record success/failure per task type per model. Over time, the router learns:
- "Classification nodes always succeed on Haiku" → stay cheap
- "Code review nodes fail on Haiku 40% of the time" → upgrade to Sonnet
- "Architecture nodes succeed on Sonnet 90% of the time" → don't need Opus

Gets 75-85% savings after ~100 executions of training data.

---

## Provider Selection

Once model tier is chosen, select the provider:

| Model Class | Provider Options | Selection Criteria |
|------------|-----------------|-------------------|
| Haiku-class | Anthropic, AWS Bedrock | Latency, regional availability |
| Sonnet-class | Anthropic, AWS Bedrock, GCP Vertex | Cost, rate limits |
| Opus-class | Anthropic | Only provider |
| GPT-4o-class | OpenAI, Azure OpenAI | Rate limits, compliance |
| Open-source | Ollama (local), Together.ai, Fireworks | Cost ($0), latency, GPU availability |

---

## Cost Impact Example

10-node DAG, "refactor a codebase":

| Strategy | Mix | Cost | Savings |
|----------|-----|------|---------|
| All Opus | 10× $0.10 | $1.00 | — |
| All Sonnet | 10× $0.01 | $0.10 | 90% |
| Static tiers | 4× Haiku + 4× Sonnet + 2× Opus | $0.24 | 76% |
| Cascading | 6× Haiku + 3× Sonnet + 1× Opus | $0.14 | 86% |
| Adaptive (trained) | Dynamic | ~$0.08 | 92% |

---

## Anti-Patterns

### Always Use the Best Model
**Wrong**: Route everything to Opus/o1 "for quality."
**Reality**: 60%+ of typical DAG nodes are classification, validation, or formatting — tasks where Haiku performs identically to Opus. You're burning money.

### Always Use the Cheapest Model
**Wrong**: Route everything to Haiku "for cost."
**Reality**: Complex reasoning, architecture design, and quality judgment genuinely need stronger models. Haiku will produce plausible-looking but subtly wrong output on hard tasks.

### Ignoring Latency
**Wrong**: Only optimizing for cost, ignoring that Opus takes 5-10x longer than Haiku.
**Reality**: In a 10-node DAG, model choice affects total execution time as much as cost. Route time-critical paths to faster models.

### No Feedback Loop
**Wrong**: Setting model tiers once and never adjusting.
**Reality**: As models improve (Haiku gets smarter every generation), tasks that needed Sonnet last month may work on Haiku today. Record outcomes and adapt.


## Output Contract

This skill produces:
- **Model integration code** with inference pipeline and input/output typing
- **Data preprocessing pipeline** with validation, normalization, and feature extraction
- **Evaluation metrics** with benchmarks and acceptance thresholds
- **Deployment configuration** with model serving setup and resource requirements
- **Monitoring hooks** for model performance tracking and drift detection