---
name: agent-architecture-patterns
description: Choose the right agent topology — single-loop ReAct, plan-and-execute, supervisor / multi-agent, graph-based — and the trade-offs of each. Use when designing a new agent, untangling a chaotic one, or deciding whether you need agents at all.
source: self
date_added: 2026-05-01
---

# Agent Architecture Patterns

Most "agent" failures come from picking the wrong topology, not the wrong model. Start with the simplest pattern that can plausibly work.

## When to Use

- Building any LLM feature that calls tools in a loop.
- "Our agent is unpredictable / slow / expensive" — usually a topology problem.
- Multiple specialized capabilities (search, code, browse) need to coordinate.
- Choosing between a framework (LangGraph, Pydantic AI, CrewAI, AutoGen) and a thin custom loop.

## Step 0: Do You Need an Agent At All?

Before reaching for an agent loop, exhaust the cheaper options:

| Pattern | Use when |
| --- | --- |
| Single LLM call, structured output | Task fits in one prompt, no external lookups |
| Pipeline (deterministic chain) | Steps are fixed and ordered |
| RAG + single call | Knowledge task with a known retrieval shape |
| Router + specialist call | Several known task types, one model picks the route |
| **Agent loop** | Steps depend on intermediate results; the path is data-dependent |

Agents are slowest, most expensive, and least predictable. Use them only when the task genuinely needs dynamic planning.

## Pattern Catalog

### 1. Single-Loop ReAct

```
loop:
  think → choose tool → call tool → observe → maybe answer
```

Pros: simple, debuggable, low overhead.
Cons: poor at long-horizon planning; can spiral on hard problems.
Good for: tool-using chat, IDE assistants, focused tasks with < 10 tool calls.

### 2. Plan-and-Execute

```
plan = LLM.plan(task)
for step in plan: execute(step)
LLM.summarize(results)
```

Pros: predictable cost; plan is auditable; easy to parallelize execution steps.
Cons: bad plans cascade; replanning logic is fiddly.
Good for: research, multi-document analysis, structured workflows.

### 3. Supervisor / Worker (Multi-Agent)

A supervisor LLM delegates subtasks to specialist agents (each with their own tool set / model).

Pros: separation of concerns; specialists can use cheaper or fine-tuned models.
Cons: high token overhead (every handoff carries context); coordination bugs; debugging is hard.
Good for: clearly distinct domains (e.g., a finance agent + a writing agent), enterprise platforms with many teams.

### 4. Graph / State Machine (e.g. LangGraph)

Nodes = steps; edges = conditional transitions; state object passed through.

Pros: explicit control flow; easy to add retries, guards, human-in-the-loop nodes; testable.
Cons: more upfront design; over-engineered for simple tasks.
Good for: production agents that must be observable, recoverable, and bounded.

### 5. Reflection / Critique Loop

Worker produces output → critic LLM scores/critiques → worker revises → repeat with cap.

Pros: improves quality on writing, code, plans; small models can match larger via reflection.
Cons: doubles cost minimum; can plateau or degrade after 1-2 cycles.
Good for: long-form generation, code that must compile/pass tests, reasoning-heavy tasks.

### 6. Tool-Free Reasoning Then Action

Model thinks in a constrained "scratchpad" first, then takes one action; no free tool-loop.

Pros: cheap, predictable, hard to misbehave.
Cons: no exploration.
Good for: classification, extraction, "decide and do" use cases.

## Choosing a Pattern

| Signal | Pattern |
| --- | --- |
| Task is one-shot | Single call (not an agent) |
| Steps known in advance | Pipeline |
| Path depends on intermediate results, ≤ ~10 steps | ReAct loop |
| Research / multi-source synthesis | Plan-and-execute |
| Clearly distinct skill sets | Supervisor / multi-agent |
| Production-grade reliability + audit | Graph / state machine |
| Quality > latency on writing/code | Reflection wrapper |

When in doubt: start with ReAct, add a graph when control flow gets gnarly, add a critic only when evals show it helps.

## Common Components

Every non-trivial agent needs:

- **State object**: task, working memory, history, budgets, last error.
- **Budget enforcement**: see `agent-loop-budgeting`.
- **Tool registry**: see `tool-use-design`.
- **Memory tiers**: short-term (in-context), working (scratchpad), long-term (vector store / DB).
- **Trace logging**: every step with inputs, outputs, tokens, latency.
- **Resumability**: state can be persisted and resumed (checkpointing).
- **Human-in-the-loop hooks**: explicit gates for confirmation / approval.

## Memory Strategy

- Don't keep raw tool outputs in history; keep summarized observations + a small evidence pointer.
- Working scratchpad the agent rewrites each turn keeps context flat.
- Long-term memory: write **after** task success only, with explicit keys and TTLs. Avoid "remember everything" patterns — they degrade over time.

## Multi-Agent Caveats

- Agents talking to agents multiply cost by N and latency by depth.
- Specialists need narrower tools and prompts than the supervisor — otherwise they duplicate work.
- Inter-agent messages should be structured (schema), not free-form.
- Cap delegation depth (typically 1-2). Recursive delegation almost always indicates a missing tool.
- A single graph with branches usually beats multi-agent for the same problem; reach for true multi-agent only when teams/owners differ.

## Frameworks

- **LangGraph** — graph/state-machine model; good for production, explicit, mature ecosystem.
- **Pydantic AI** — typed, Pythonic, lightweight; good for structured tool agents.
- **CrewAI / AutoGen** — multi-agent first; good for prototypes and research demos.
- **DSPy** — programmatic prompting + auto-optimization; good when you have evals and want them to drive prompts.
- **Custom thin loop** — often the right answer for a single ReAct agent; avoids framework lock-in.

Pick by: how much of the framework will you actually use? If it's < 30%, go custom.

## Evaluation

- End-to-end success rate on a task suite (the only metric users care about).
- Steps per task (lower is better, given equal success).
- Cost per successful task (the metric to optimize).
- Recovery rate from injected failures (tool error, bad arg, empty result).
- Replay traces from prod against new versions; gate on regression.

## Anti-Patterns

- "Agent" wrapper around a task that's really a pipeline.
- Multi-agent for problems a single graph would solve at 1/N cost.
- No budget; agent can loop forever on pathological inputs.
- Free-form messages between agents — non-deterministic and hard to debug.
- Reflection without a stop condition; quality oscillates.
- Letting the agent decide its own model — costs explode.
- Framework cargo-cult: importing LangGraph for a 30-line ReAct loop.

## Quick Wins Checklist

- [ ] Considered and rejected non-agent patterns first
- [ ] Topology matches task class (table above)
- [ ] Budgets enforced (`agent-loop-budgeting`)
- [ ] State object is explicit and serializable (resumable)
- [ ] Tool outputs summarized before re-entering context
- [ ] Inter-agent messages are structured if multi-agent
- [ ] End-to-end eval suite with cost-per-success metric
- [ ] Trace logs replayable against new versions

## References

- "ReAct: Synergizing Reasoning and Acting in Language Models" — Yao et al., 2023
- "Reflexion" — Shinn et al., 2023
- LangGraph, Pydantic AI, CrewAI, AutoGen, DSPy docs
- Anthropic "Building Effective Agents" (2024)
- Related skills: `tool-use-design`, `agent-loop-budgeting`, `mcp-server-design`, `llm-eval-harness`, `prompt-injection-defense`, `llm-engineering`
