---
name: agent-memory-design
description: >
  Guide the design of memory systems for AI agents -- short-term conversation
  memory, long-term persistent memory, working memory for multi-step tasks,
  semantic recall with embeddings, and memory integration with tools and MCP.
  Use when the user is building agents that need to remember past interactions,
  maintain state across sessions, implement semantic search over conversation
  history, or integrate with external tools via MCP. Also use when user
  mentions "agent memory", "conversation history", "semantic recall",
  "persistent memory", "MCP tools", "tool integration", or asks how to make
  their agent remember things or connect to external services.
---

# Agent Memory Design

Memory transforms a stateless LLM into a persistent agent that learns and adapts. Without memory, every conversation starts from zero. These patterns cover the different memory types and how to implement them.

## The Agent Loop

At its core, every agent follows the same loop:

```
1. Receive user input
2. Combine with: system prompt + memory + tools
3. Call the LLM
4. LLM decides: respond to user OR call a tool
5. If tool call: execute tool, feed result back to LLM (go to 3)
6. If response: return to user (go to 1)
```

Memory is what persists between iterations of this loop.

## Three Types of Agent Memory

### 1. Short-Term Memory (Conversation History)

The message thread within a single session. This is what most people think of as "chat history."

- **Implementation**: Store messages in an array, pass to each LLM call
- **Challenge**: Context window fills up. You need a strategy for long conversations.
- **Solutions**: Sliding window (keep last N messages), summarization, or compression

### 2. Long-Term Memory (Persistent Across Sessions)

Information that persists between conversations. The agent "remembers" the user.

- **Implementation**: Store facts/preferences in a database, retrieve and inject into context
- **What to store**: User preferences, past decisions, learned facts, relationship history
- **Challenge**: Deciding what's worth remembering vs. what's noise

### 3. Working Memory (Active Task State)

Scratchpad for the current multi-step task. Think of it as the agent's "notepad."

- **Implementation**: Structured state object updated at each step
- **What to store**: Current task progress, intermediate results, active plan
- **Challenge**: Must be updated explicitly; the agent doesn't maintain it automatically

## Semantic Recall

The most powerful form of memory retrieval. Instead of exact-match lookups, semantic recall finds *conceptually similar* past interactions.

### How It Works

1. **Embed conversations**: Convert messages to vector embeddings
2. **Store in vector DB**: Index embeddings for similarity search
3. **Query at runtime**: Convert current input to embedding, find similar past interactions
4. **Inject into context**: Add relevant past interactions to the agent's context

### Configuration

- **topK**: How many similar results to retrieve (start with 5-10, tune from there)
- **Threshold**: Minimum similarity score to include (prevents irrelevant matches)
- **Recency weighting**: Prefer recent memories over old ones

### Example Pattern

```typescript
// On each message:
const embedding = await embed(userMessage);
const similar = await vectorDB.query(embedding, { topK: 10 });
const relevantContext = similar.map(m => m.text).join("\n");

// Include in LLM call:
const response = await generateText({
  system: `${systemPrompt}\n\nRelevant past interactions:\n${relevantContext}`,
  messages: currentConversation,
});

// Store for future recall:
await vectorDB.upsert(userMessage, embedding);
```

## Tools and MCP Integration

Tools are how agents interact with the outside world. MCP (Model Context Protocol) standardizes this.

### Tool Design Principles

- **Clear names and descriptions**: The LLM uses these to decide when to call a tool
- **Focused scope**: Each tool does one thing well
- **Helpful errors**: Return error messages the agent can act on
- **Idempotent when possible**: Safe to retry on failure

### MCP Overview

MCP is the "USB-C port" for AI -- a standard protocol for connecting agents to tools.

- **Servers** wrap sets of tools and communicate over HTTP
- **Clients** (models/agents) query available tools and request execution
- **Benefit**: Build a tool once, use it with any MCP-compatible agent

### When to Use MCP

- Your roadmap involves many third-party integrations (calendar, chat, email, web)
- You want tools to be reusable across different agents
- You're building a tool that other agents should be able to use

### Tool Ecosystem Categories

| Category | Examples | Use Case |
|----------|----------|----------|
| Web scraping | Browserbase, Stagehand, Playwright | Extracting web data |
| Search | Exa, Tavily, Brave Search | Finding information |
| Integrations | Composio, Pipedream | Connecting to SaaS apps |
| Code execution | E2B, Daytona | Running generated code |
| Communication | Email, Slack, calendars | Agent-to-human communication |

### Integration Platform Decision

| Need | Use |
|------|-----|
| Developer-friendly, moderate price | Composio, Pipedream, Apify |
| Enterprise, deep integrations | Specialized vendors |
| Custom domain tools | Build your own MCP server |

## Decision Framework

| Situation | Memory Type |
|-----------|------------|
| Single conversation context | Short-term (message history) |
| User preferences across sessions | Long-term (persistent store) |
| Multi-step task in progress | Working memory (scratchpad) |
| "What did we discuss about X?" | Semantic recall (vector search) |
| Connecting to external services | MCP tools |
| Building reusable tools | MCP server |

## Gotchas

- Not every conversation is worth memorizing. Be selective about what goes to long-term memory.
- Semantic recall topK and threshold need tuning -- too low misses relevant info, too high adds noise.
- Agent memory is separate from RAG. Memory is about past interactions; RAG is about external knowledge.
- MCP is powerful but the ecosystem is young. Use a framework (Mastra, AI SDK) rather than implementing the spec yourself.
- Tool descriptions are prompts. A bad description means the agent won't call the tool when it should.
- Security through obscurity doesn't work with agents -- they're more diligent than humans at exploring accessible data.

For detailed implementation examples, see `references/memory-implementation.md`.