---
name: forge-multi-agent
description: When to spawn subagents vs do work inline. Patterns - fan-out, sequential, orchestrator/worker, judge/critic, retry-with-fallback, pipeline. Context budget discipline, parallelism caps, failure handling as data, audit logging. Contains worked TypeScript orchestrator with bounded parallelism. Use when designing an agent system that delegates work to other agents.
license: MIT
---

# forge-multi-agent

You are designing a system where one agent decides to call others. Default agent design either does everything in one massive context (until it falls over from token rot) or spawns subagents for every micro-task (until it falls over from coordination overhead). This skill exists to find the middle.

The first principle: **multi-agent is a context-management strategy, not a quality strategy.** Spawning a subagent buys you a fresh context window. It does not, by itself, make the answer better. Use it when context is the constraint, not when capability is.

## Quick reference (the things you must never ship)

1. A subagent spawned for a task that fits in 100 tokens of context.
2. Parallel fan-out without a cap on concurrent subagents (>10).
3. Subagent brief over 2000 tokens (over-briefed).
4. No iteration cap on the orchestrator loop.
5. Subagent failures bubbling as exceptions instead of structured results.
6. Subagents writing to the same file without coordination.
7. No audit log of what each subagent was asked and what it returned.
8. Deep subagent trees (subagents spawning subagents spawning subagents).
9. Treating subagent output as ground truth without verification.
10. Re-doing the same subagent call in one session (no caching).

## Hard rules

### When to spawn

**1. Spawn when the task would consume >25% of the parent's context inline.** Reading 100 files, scanning a long log, exploring an unfamiliar codebase - these belong in a subagent so the parent stays small.

**2. Spawn when the work is independent of the parent's ongoing decisions.** Two queries with no data dependency can run in parallel. A query that depends on the previous step stays inline.

**3. Spawn when the failure mode is "produces a lot of noise."** A subagent returning a 200-token summary is cheaper than the parent doing the same work and absorbing 50KB of tool output along the way.

**4. Do NOT spawn for tasks under 60 seconds of focused work.** The overhead of spinning up + briefing + waiting is real. Quick lookups stay inline.

**5. Do NOT spawn for tasks that need the parent's working memory.** "Continue what we were doing" is not a brief that survives the context boundary. If the subagent needs to know everything the parent knows, the brief becomes the size of the original context and you have gained nothing.

### Patterns

**6. Fan-out.** Parent spawns N parallel subagents with independent tasks, merges results.

```ts
// reference: bounded-parallel fan-out
async function fanOut<T, R>(
  items: T[],
  brief: (item: T) => string,
  spawn: (prompt: string) => Promise<R>,
  opts: { concurrency: number } = { concurrency: 5 },
): Promise<Array<{ item: T; result: R | null; error: unknown }>> {
  const results: Array<{ item: T; result: R | null; error: unknown }> = new Array(items.length);
  let cursor = 0;
  async function worker() {
    while (true) {
      const i = cursor++;
      if (i >= items.length) return;
      try {
        const r = await spawn(brief(items[i]));
        results[i] = { item: items[i], result: r, error: null };
      } catch (err) {
        results[i] = { item: items[i], result: null, error: err };
      }
    }
  }
  await Promise.all(Array.from({ length: opts.concurrency }, () => worker()));
  return results;
}

// use:
const findings = await fanOut(
  ["src/api/", "src/auth/", "src/billing/"],
  (dir) => `Search ${dir} for direct Stripe API calls. Return JSON list of {file, line}.`,
  spawnSubagent,
  { concurrency: 3 },
);
```

**7. Sequential delegation.** Parent spawns subagent A, uses A's output to brief subagent B. Each stage independent in context but dependent in data. Watch chain length; over four stages quality erodes.

**8. Orchestrator/worker.** Parent is a thin coordinator, workers do the work. Best when the parent itself needs almost no domain knowledge - just routes. Orchestrator's prompt is short; each worker has a deep skill loaded.

**9. Judge/critic.** One agent produces, another evaluates. Useful for high-stakes outputs (security review, design critique, fact-checking). The critic should not have written the original - independence is the point.

**10. Retry-with-fallback.** Same task, different prompt or different model on failure. Wrap the subagent call; if output fails validation, re-spawn with adjusted instructions.

**11. Pipeline.** Linear chain where each subagent narrows the output. Search → filter → summarize. Each stage tighter scope, smaller context than the last.

### Context budget

**12. Parent context budget is at most 50% of model window.** Reserve the other half for the orchestrator's own thinking and synthesizing results.

**13. Subagent brief at most 1500 tokens.** Longer means you have not factored correctly. See [`forge-agent-prompt`](../forge-agent-prompt/SKILL.md) for the structure.

**14. Subagent output budget is announced in the brief.** "Report in under 300 words" or "return JSON with exactly these fields." Without a stated budget, subagents produce wide-ranging output that dwarfs the parent.

### Parallelism

**15. Spawn in parallel when there is no data dependency. Sequentially otherwise.**

**16. Cap parallel subagents at 10.** Beyond that you hit rate limits, exhaust the parent's tracking ability, and pay marginal coordination cost that outweighs the parallelism win.

**17. All-or-nothing on a fan-out is a smell.** If 8 of 10 subagents succeed and 2 fail, the parent should use the 8 and decide separately about the 2. Design for partial success.

### Communication

**18. Subagents do not talk to each other directly.** All coordination flows through the parent. Sibling-to-sibling messaging is hard to debug, breaks ordering, pushes complexity into the wrong layer.

**19. Subagent return values are structured, not free-form.** Ask for JSON or a specific markdown shape. Free-form returns force the parent to re-parse.

**20. Parents never blindly trust subagent output for high-stakes decisions.** Verify outputs before acting (see [`forge-subagent-eval`](../forge-subagent-eval/SKILL.md)).

### Failure handling

**21. Subagent timeout has a default.** Without a timeout the parent can hang indefinitely. Set per-task; error if exceeded.

**22. Errors propagate as data, not exceptions.** A subagent that cannot complete returns a structured "could not complete because X" result. The parent decides whether to retry, skip, or fail.

```ts
type SubagentResult<T> =
  | { ok: true; data: T }
  | { ok: false; reason: "timeout" | "validation_failed" | "refused"; detail: string };
```

**23. Audit logs of all subagent spawns.** Tool name, brief size, result size, duration, outcome. Without logs, debugging a misbehaving multi-agent system is impossible.

```ts
logger.info({
  parent_call_id: parentId,
  subagent_role: "code_searcher",
  brief_tokens: estimateTokens(brief),
  result_tokens: estimateTokens(JSON.stringify(result)),
  duration_ms: Math.round(performance.now() - start),
  outcome: result.ok ? "success" : `failure:${result.reason}`,
}, "subagent.call");
```

### Costs

**24. Each subagent costs a full context warm-up.** Real money. Measure it. A system that spawns 50 subagents per user query is 50x more expensive than a single-context solution.

**25. Cache where possible.** If the same subagent task runs against the same input twice in one session, cache. Most multi-agent frameworks have nothing built in; you write the cache yourself.

## Anti-patterns

- **"More agents = better" reflex.** Spawning subagents for every step does not improve quality - it usually degrades it.
- **Subagent that just wraps a single tool call.** Skip the subagent and call the tool from the parent.
- **Deep subagent trees.** Subagents spawning subagents spawning subagents is almost always bad factoring. Flatten.
- **Subagents with overlapping responsibility.** Two subagents both authorized to write the same file race. Assign ownership.

## Common AI-output patterns to reject

| Pattern | Why wrong | Fix |
| --- | --- | --- |
| Spawn subagent for "format this string" | Overhead > task | Inline |
| `Promise.all(items.map(spawn))` with 100 items | Hits rate limits, exhausts memory | Bounded `fanOut` with concurrency 5-10 |
| Brief includes full conversation history | Over-briefed, defeats context savings | Brief is self-contained, no history |
| Subagent that loops forever calling more subagents | No iteration cap | Hard cap on loop, hard cap on depth |
| `try { spawn() } catch (e) { throw e }` | Failure cascades, no partial recovery | Failure as data: `{ ok: false, reason }` |
| Subagent return: "I checked everything and it looks fine" | Free-form, unverifiable | JSON with specific fields |
| No logging of subagent calls | Cannot debug failures | Structured `subagent.call` event per spawn |
| Two subagents both edit same file in parallel | Race condition | Assign file ownership, or serialize |

## Worked example: research-then-write orchestrator

```ts
// orchestrator: gathers facts via parallel subagents, then writes the report
import { fanOut } from "./fanout.js";

const RESEARCH_TARGETS = ["postgres-17", "next-16", "claude-4-7", "tree-sitter"];

type Finding = { topic: string; summary: string; sources: string[] };

async function research(topic: string): Promise<Finding> {
  const brief = `
Find the three most cited facts about "${topic}" relevant to building developer tools.
Return JSON with shape: { "summary": string (under 200 words), "sources": string[] (URLs) }.
Use web_search and read_url tools. Spend at most 8 tool calls. Do not write prose outside JSON.
`.trim();

  const result = await spawnSubagent(brief, {
    model: "claude-sonnet-4-6",
    max_tokens: 4000,
    timeout_ms: 60_000,
  });

  // Verify shape before trusting (see forge-subagent-eval)
  const parsed = z.object({
    summary: z.string().max(2000),
    sources: z.array(z.string().url()).max(10),
  }).safeParse(JSON.parse(result.text));

  if (!parsed.success) {
    throw new Error(`subagent returned malformed output for ${topic}`);
  }
  return { topic, summary: parsed.data.summary, sources: parsed.data.sources };
}

async function orchestrate(): Promise<string> {
  // Parallel research with bounded concurrency
  const findings = await fanOut(
    RESEARCH_TARGETS,
    (topic) => topic,
    research,
    { concurrency: 4 },
  );

  // Use successful findings; skip failures
  const usable = findings.filter((f) => f.result !== null).map((f) => f.result!);
  if (usable.length === 0) throw new Error("all research subagents failed");

  // Write the report using only the gathered context (sequential, not parallel)
  return await spawnSubagent(`
Write a 600-word briefing for a developer who builds AI tools.
Use only the facts below; cite each by its source.

${usable.map((f) => `### ${f.topic}\n${f.summary}\nSources: ${f.sources.join(", ")}`).join("\n\n")}
`, { model: "claude-sonnet-4-6", max_tokens: 2000 });
}
```

What this demonstrates: parallel fan-out with concurrency cap of 4 (rule 16); per-subagent timeout (rule 21); shape validation on returns (rule 19); partial success allowed (rule 17); orchestrator stays small (rule 12); only synthesizes, does no research itself.

## Workflow

When designing a multi-agent system:

1. **List the actual work.** Write it down as a flat list of tasks.
2. **Group by dependency.** What can run in parallel? What must wait?
3. **Estimate context per task.** Under 25% of model window stays inline by default.
4. **Choose a pattern.** Fan-out, sequential, orchestrator/worker, judge/critic, pipeline.
5. **Define brief and return shape for each role.** Brief under 1500 tokens, output budget announced.
6. **Add eval / verification on outputs that influence further action.** See [`forge-subagent-eval`](../forge-subagent-eval/SKILL.md).
7. **Test end-to-end with the real model.** Multi-agent systems behave differently in practice than they look on paper.

## Verification

Multi-agent design is structural; verification is by review. Manual checklist:

- [ ] Every subagent has a stated output budget in its brief.
- [ ] No brief exceeds ~1500 tokens.
- [ ] Parallel fan-outs capped at 10 concurrent.
- [ ] Failures handled as data, not fatal.
- [ ] Audit log captures every spawn (task, duration, outcome).
- [ ] Subagents do not spawn their own subagents (or, if they do, depth limit is explicit).

## When to skip this skill

- Single-context agents (no delegation).
- Tightly coupled multi-step prompts that fit in one window. Just chain in one context.
- Background batch jobs where latency does not matter - sequential single-context is simpler.

## Related skills

- [`forge-agent-prompt`](../forge-agent-prompt/SKILL.md) - how to write the subagent brief.
- [`forge-subagent-eval`](../forge-subagent-eval/SKILL.md) - verifying subagent output before trusting it.
- [`forge-tool-use`](../../llm/forge-tool-use/SKILL.md) - tool calling within each subagent.
- [`forge-prompt-engineering`](../../llm/forge-prompt-engineering/SKILL.md) - prompt design for the subagent.
