---
name: context-loading-protocol
description: Decide which agents and skills to load for a given task. Use at the start of every task to select the minimum viable context load, calculate the token budget, and stay below the 40% utilization ceiling.
role: orchestrator
user-invocable: true
---

# Context Loading Protocol

Token-budget reference (CLAUDE.md baseline, full-load ceiling, per-agent and per-skill costs) is the **Baseline Budget** section of `CLAUDE.md`. This skill is the runtime procedure; don't duplicate the table here — it goes stale.

## Constraints

- Never load all agents upfront; load only the primary agent for each phase.
- Keep total context below **40%** of the model's window at all times.
- Load agents on demand when their phase begins, not speculatively.
- Use tool-based file reads (Read); do not paste file contents into the prompt.

## Loading Decision Procedure

### Step 1: Classify the task

| Profile | Description | Example |
|---|---|---|
| **Simple/Single** | One agent, no skills | "Fix this typo", "Write a unit test" |
| **Standard/Single** | One agent + 1–2 skills | "Implement this feature using hexagonal architecture" |
| **Multi-Agent** | 2–3 agents coordinating | "Design and implement a new API endpoint" |
| **Complex/Multi** | 3+ agents + skills | "Build a new bounded context with full test coverage" |

### Step 2: Select agents

Load the **minimum set**:

1. Identify the **primary agent** (owns the deliverable).
2. Identify **supporting agents** (input or review).
3. Do NOT load agents for downstream validation yet — load them when their phase begins.

Order: primary first, then supporting agents one at a time as their phase begins.

### Step 3: Select skills

For each loaded agent, check its `## Skills` section:

- Only load skills **relevant to the current task** — not all skills the agent references.
- Skills shared by multiple loaded agents only need to be loaded once.

### Step 4: Calculate token budget

```
Total = CLAUDE.md baseline
      + conversation history (estimate)
      + agent files (sum selected)
      + skill files (sum selected)
      + expected output (estimate)
```

**Target: total < 40% of the model's context window.** For Claude with a 200K window, that's < 80K tokens. The config files are a small fraction; the real budget concern is conversation history + output accumulation over multi-turn tasks.

### Step 5: Load via tool-based file reads

```
Read agents/software-engineer.md
Read skills/hexagonal-architecture/SKILL.md
```

Do NOT copy file contents into the system prompt or conversation.

## Loading Profiles

Pre-computed loading sets for common task types.

### Code Implementation
- **Load**: Software Engineer + relevant skill(s)
- **Defer**: QA (load after implementation), Architect (load only if design questions arise)

### Architecture Design
- **Load**: Architect + relevant architecture skill(s)
- **Defer**: Software Engineer (load at implementation), QA (load at validation)

### Bug Fix
- **Load**: Software Engineer only
- **Defer**: QA (load if regression test needed)

### New Feature (full lifecycle)

Three phases, each in a fresh context window with a human review gate between. Each phase's output is a structured progress file in `memory/` that onboards the next phase.

| Phase | Load | Purpose | Output |
|---|---|---|---|
| 1. Research | Orchestrator + sub-agents (exploration) | Understand system, find files, trace data flows | Research progress file |
| 2. Plan | Architect + PM (if needed) + relevant skill(s) | Specify every change: files, snippets, tests | Implementation plan progress file |
| 3. Implement | Software Engineer + QA + skill(s) | Execute the plan; code, tests | Working code + test results |

Key rules:
- Each phase starts with a fresh context window, loading only the previous phase's progress file.
- Human reviews and approves the progress file before the next phase begins.
- Sub-agents primarily provide context isolation — they search, read, and return concise findings.
- If implementation is large, compact mid-phase: update the plan progress file with completed steps and continue in a fresh context.

## Unloading

Since tokens can't be literally removed from context:

1. **Phase transitions** — summarize completed phase output into `memory/` and start a new conversation for the next phase.
2. **Within a conversation** — stop referencing the agent/skill; the orchestrator mentally notes it's no longer active. Use the Context Summarization skill to compress stale content.
3. **Multi-turn accumulation** — when conversation history crosses **30%** utilization, trigger summarization before loading additional agents.

## Anti-patterns

- Loading all agents upfront — wastes tokens before any work begins. Load only the primary agent.
- Loading all of an agent's skills — most are irrelevant to the specific request.
- Never unloading — context grows monotonically until hallucination risk. Summarize and phase-transition.
- Loading agents "just in case" — adds cost without value. Load on demand when the phase begins.

## Output

Loading plan as one table: selected agents + skills, token costs, estimated total, and utilization percentage against the 40% ceiling. No narration.
