---
name: memory-and-state
description: Use to maintain persistent context across agent sessions — decision logs, architecture memory, and constraint tracking.
---

# Memory and State

Use this skill to prevent agents from forgetting prior decisions and repeating mistakes.

## Three memory layers

### 1. Decision log (`DECISIONS.md`)

A version-controlled file at the repo root. Records architectural and behavioral decisions.

Format:

```markdown
## YYYY-MM-DD: [Decision title]
- **Context**: Why this decision was needed
- **Decision**: What was decided
- **Alternatives considered**: What was rejected and why
- **Constraints introduced**: What future work must respect
```

Rules:
- Read `DECISIONS.md` at the start of every planning or implementation task. This is not optional.
- Append to it when making a decision that future work depends on.
- Never silently contradict an existing decision — either follow it or propose a reversal with user approval.
- When appending, verify the new entry does not conflict with existing entries.

### 2. Architecture memory (`ARCHITECTURE.md`)

A version-controlled file describing the codebase structure.

Should contain:
- Module map (directory → purpose)
- Key interfaces and their contracts
- Data flow overview
- External service dependencies
- Known technical debt

Rules:
- Read before working on unfamiliar modules.
- Update when structural changes are made (new modules, moved files, changed boundaries).

### 3. Session-scoped working memory

For in-progress tasks that span multiple agent interactions:

- Use session notes or scratch files to track:
  - What has been done so far
  - What remains
  - Errors encountered and how they were resolved
  - Open questions

This prevents re-discovery and reduces repeated mistakes within a session.

## Session Resume Protocol

When returning to a repository **after a break** and the project boundaries are already confirmed, use this protocol instead of `on-project-start`.

### First start vs. resume

| Situation | Action |
|-----------|--------|
| First entry to a new repository | Use `on-project-start` skill |
| Returning after a break, boundaries already confirmed | Use this protocol |
| Resuming a mid-task after interruption | This protocol + read session notes |

### Layered loading order

Load context in this sequence, stopping as soon as you have enough to proceed:

**L0 — Stable anchors** (always load, target ≤ 500 tokens)
- `project/project-manifest.md` — confirmed project scope and constraints
- The most recent 3 entries in `DECISIONS.md` — recent cross-cutting decisions

**L1 — Last session state** (load if a summary exists, target ≤ 1,500 tokens)
- Last task completion summary (from session notes or docs)
- If no summary exists: the 5 most recent `DECISIONS.md` entries + module map section of `ARCHITECTURE.md` only (not the full file)

**L2 — Task-specific context** (on demand, ≤ 1,000 tokens per pull)
- Load only once the current task is defined
- Apply selective retrieval — keyword match against affected modules + recency window (see Triage-driven selective retrieval)
- Do not pre-load L2; wait until the scope of the task is clear

### Loading stops when

You can confidently answer all three:
- "What was I doing last time?"
- "What constraints currently apply?"
- "What is the next concrete step?"

If the token budget for a tier is reached before the question is answered, apply selective retrieval for remaining gaps rather than reading in full.

### Resume confirmation output

After loading, produce a brief confirmation before starting work:

```text
Session resumed:
- Last completed: [brief summary]
- Active constraints: [1–3 bullets from DECISIONS.md]
- Pending work: [from session notes, if any]
- Ready to: [next step]
```

## Conversation memory tiering

Within a single conversation or agent session, manage history in three tiers to prevent context window exhaustion while preserving relevant information.

### Tier 1 — Short-term (raw recent turns)

Keep the most recent **3–5 turns** of conversation as raw, unmodified content. This preserves full fidelity for the active working context.

- Adjust the window size based on turn complexity: if turns are long (>500 tokens each), reduce to 2–3 turns.
- This tier occupies Layer 4 (volatile context) in the prompt cache loading order.

### Tier 2 — Mid-term (compressed summaries)

When turns age out of the short-term window, compress them using the canonical template in `docs/agent-templates.md` → Compaction summary template.

Rules:
- Produce a summary when the short-term window shifts (i.e., every time a turn exits the window).
- For batch efficiency, summarize in groups of 3–5 turns rather than one at a time.
- Store summaries in session memory. If a summary already exists, update it iteratively: preserve still-relevant context, add new progress, and move completed items out of pending state.
- Do not duplicate or redefine the compaction summary format here; `docs/agent-templates.md` is the single source of truth.
- This tier integrates with the existing context compaction protocol in `docs/operating-rules.md`.

### Tier 3 — Long-term (persistent retrieval)

For knowledge that persists beyond a single session, use the existing persistent stores:

| Store | Content | Retrieval |
|-------|---------|-----------|
| `DECISIONS.md` | Architectural and behavioral decisions | Read at task start |
| `ARCHITECTURE.md` | Module map, interfaces, data flow | Read when working on unfamiliar modules |
| Repo memory files | Reusable patterns, component-level notes | Search by module name or task type keywords |
| `DECISIONS_ARCHIVE.md` | Inactive past decisions | Search only for legacy module work |

### RAG-augmented retrieval [OPTIONAL]

For teams with vector database or embedding infrastructure, long-term memory can be augmented with semantic retrieval (RAG). This replaces full-file reads with targeted retrieval, reducing token consumption for large codebases where `DECISIONS.md` or `ARCHITECTURE.md` alone exceed the 3,000-token Tier 3 budget.

**This section is entirely optional.** If no vector store is available, use the file-based selective retrieval from the Triage-driven selective retrieval section below. The playbook functions identically without RAG.

#### Indexing targets

Index the following sources as embeddings:

| Source | Refresh trigger | Priority |
|--------|----------------|----------|
| `DECISIONS.md` | After every Record step | High |
| `ARCHITECTURE.md` | After structural changes | High |
| Session summaries (Tier 2) | After each summary generation | Medium |
| Task completion summaries | After every Summarize step | Medium |
| Trace files (`.agent-trace/`) | After every Trace step | Low |

#### Index refresh triggers

- **After every Record step** — re-index `DECISIONS.md` when new entries are appended.
- **On session start** — if the index age exceeds a configurable threshold (default: 24 hours or 5 new entries since last index), perform a full re-index.
- **On demand** — when a query returns zero relevant results, trigger a re-index and retry once.

#### Query strategy

1. **Compose query** — embed the current task description + file paths being modified.
2. **Retrieve top-K** — return the K most relevant entries (default K=5).
3. **Verify relevance** — skim the retrieved entries for genuine relevance; discard false positives.
4. **Merge into context** — inject the relevant entries into the agent's context window as Tier 3 content.

#### Token budget impact

RAG results **replace** the 3,000-token Tier 3 budget — they do not add to it. The total conversation memory target (8,500 tokens) remains unchanged:

| Tier | Without RAG | With RAG |
|------|------------|----------|
| Tier 3 (long-term) | Full-file reads, ≤ 3,000 tokens | Top-K retrieval, ≤ 3,000 tokens |
| Total budget | ≤ 8,500 tokens | ≤ 8,500 tokens (unchanged) |

#### Fallback

If the vector store is unavailable at runtime, follow the **Retrieval degradation chain** (see below) starting at Level 1:

1. Fall back to the **Triage-driven selective retrieval** procedure (Level 1: keyword search + recency window).
2. If selective retrieval is also not applicable (fewer than 30 entries), fall back to standard full-file reads (Level 3).
3. Log the fallback in the task summary: `**Memory retrieval**: Level N — [method name]`.

### Token budget guideline

| Tier | Target budget | Enforcement |
|------|--------------|-------------|
| Short-term (raw turns) | ≤ 4,000 tokens | Trim oldest turn when exceeded |
| Mid-term (summary) | ≤ 1,500 tokens | Regenerate summary with tighter compression |
| Long-term (persistent reads) | ≤ 3,000 tokens per task | Use selective read strategy (see below) |
| **Total conversation memory** | **≤ 8,500 tokens** | Roughly 6–7% of a 128K context window |

## Context anchor protocol

For any task spanning more than one step or more than one file, maintain a context anchor using the canonical template in `docs/agent-templates.md` → Context anchor template.

Do not duplicate or redefine the template here; treat `docs/agent-templates.md` as the single source of truth for the anchor format.

Update this anchor before each major step. This prevents drift by forcing the agent to re-read the plan and current state.

## Contradiction detection

Before making any decision, check `DECISIONS.md` for conflicts:

1. Read the full decision log
2. Compare each existing entry against the proposed change
3. If a conflict exists, state: the existing decision (date + title), the proposed change, why they conflict, and options (follow existing or reverse with justification)

STOP and wait for user decision. Do not resolve contradictions autonomously.

## When to write memory

> **Policy check**: If `prompt-budget.yml` → `decision_log.policy: example_only`, replace every `Append to DECISIONS.md` action in the table below with `Record in task summary or trace`. Do not write to `DECISIONS.md` unless the user explicitly requests it, or you are updating the template's own decision-format.

| Event | Action |
|-------|--------|
| Architectural decision made | Append to `DECISIONS.md` |
| New module or structural change | Update `ARCHITECTURE.md` |
| Constraint discovered during work | Add to `project/project-manifest.md` |
| Error pattern found | Note in session memory to avoid repeating |
| Task partially complete | Write progress to session notes |
| Technology or library introduced | Append to `DECISIONS.md` |
| Schema or contract changed | Append to `DECISIONS.md` |
| Tradeoff made | Append to `DECISIONS.md` |

## When to read memory

| Event | Action |
|-------|--------|
| Starting any implementation | Read `DECISIONS.md` and `ARCHITECTURE.md` |
| Encountering unfamiliar module | Read `ARCHITECTURE.md` |
| Making a decision | Check `DECISIONS.md` for prior related decisions |
| Resuming interrupted work | Read session notes |
| Starting a long task | Produce a context anchor |
| Starting a Small task | Query categorized memory for similar past patterns (see below) |

## Categorized memory structure

Memory is organized along two dimensions: **scope** (how broadly it applies) and **category** (what kind of information it contains).

### Scope levels

Tag every persistent memory entry with a scope to control when it is loaded:

| Scope | Meaning | When loaded |
|-------|---------|-------------|
| `global` | Applies across all projects or sessions | Always (L0) |
| `project` | Specific to this repository | On session resume (L1) |
| `component:<path>` | Specific to one module or directory | On demand when working on that component (L2) |

Use the narrowest scope that is accurate. Scope drives the loading order in the Session Resume Protocol — `global` loads first, `component` loads last and only on demand.

### Category × Scope matrix

| Category | Scope | Primary store | Query when |
|----------|-------|---------------|------------|
| Architectural decisions, global conventions | `project` | `DECISIONS.md`, `ARCHITECTURE.md` (task summary/handoff/trace when `decision_log.policy: example_only`) | Starting any task, making architectural choices |
| Component patterns, quirks, module constraints | `component:<path>` | Module READMEs, session/repo memory files | Working on a specific or unfamiliar module |
| Recurring fix patterns, validated approaches | `project` or `global` | Session/repo memory files | Starting a Small task (search by task-type keywords) |
| User preferences, cross-repo conventions | `global` | Personal memory files | Any task |

### Tagging memory files

Include a `scope` field in frontmatter when writing persistent memory files:

```yaml
---
name: <memory name>
description: <one-line description — used to decide relevance at session start>
type: feedback | project | reference | user
scope: global | project | component:<relative-path>
---
```

**Small task retrieval**: Before implementing a Small task, search session/repo memory for the affected module or task-type keywords. If a matching pattern exists, follow it. If not, proceed normally and capture the pattern in the task completion summary.

## Memory lifecycle management

Persistent memory files grow over time. Without active lifecycle management, they consume excessive tokens on every read and eventually exceed context windows.

### Decision archive (cold storage)

#### When to archive

- `DECISIONS.md` exceeds **50 entries** or **30 KB**, OR
- Any memory health indicator (see below) reaches the "needs attention" threshold, OR
- During periodic maintenance review (quarterly for low-volume projects; more frequently when thresholds are hit)

#### Archive procedure

1. Review each entry in `DECISIONS.md`
2. For each entry, check: **are the constraints introduced still actively referenced by current code or docs?**
3. If the constraints are **no longer active** (the code has moved on, the pattern was replaced, etc.), move the entry to `DECISIONS_ARCHIVE.md`
4. If the constraints are **still enforced**, keep the entry in `DECISIONS.md`

Never archive based on date alone — a 2-year-old decision with active constraints stays in `DECISIONS.md`.

#### Archive file format

`DECISIONS_ARCHIVE.md` uses the same entry format as `DECISIONS.md`, with one addition: append `- **Archived on**: YYYY-MM-DD — [reason, e.g., "replaced by decision X"]` to each entry.

#### Safety checks before archiving

- [ ] Archived entry's constraints are verifiably no longer enforced
- [ ] No current code depends on the archived pattern
- [ ] Archive entry includes the reason
- [ ] `DECISIONS.md` retains all entries with active constraints

### Selective read strategy

Agents should not read the full archive on every task. Use a tiered approach. If `DECISIONS_ARCHIVE.md` does not exist yet, treat archive searches as returning no matches and skip any archive-read step that would otherwise require the file.

| Situation | What to read |
|-----------|-------------|
| Normal task | `DECISIONS.md` only (active constraints) |
| Task involves legacy module or old migration | `DECISIONS.md` + search `DECISIONS_ARCHIVE.md` for module name, if the archive file exists |
| Contradiction detection finds no match in active | Search `DECISIONS_ARCHIVE.md` before concluding "no prior decision", if the archive file exists; otherwise treat as no archived match |
| Periodic maintenance review | Read both files in full if `DECISIONS_ARCHIVE.md` exists; otherwise read `DECISIONS.md` only |

### Triage-driven selective retrieval [OPTIONAL]

When `DECISIONS.md` grows large (over **30 entries** or **20 KB**), reading it in full on every task wastes token budget. Use triage results to load only relevant decisions.

#### Activation threshold

- Below 30 entries / 20 KB: read `DECISIONS.md` in full (current behavior, no change needed).
- At or above threshold: switch to selective retrieval using the procedure below.

#### Selective retrieval procedure

1. **Extract affected modules** — from the triage output, collect the file paths classified as affected. Derive module/directory names (e.g., `src/api/`, `src/services/user`).
2. **Keyword search** — scan `DECISIONS.md` for entries whose title or constraint text contains any of the module keywords.
3. **Recency window** — always load the **most recent 5 entries** regardless of keyword match, to catch recent cross-cutting decisions.
4. **Title scan for contradiction detection** — for entries that did not match keywords or recency, read only the `## YYYY-MM-DD: [title]` header lines. If any title suggests relevance to the current task, load that full entry. This step also serves as Level 2 in the **Retrieval degradation chain** when keyword search returns zero matches.
5. **Fallback** — if keyword search returns zero matches (excluding recency window), fall back to reading `DECISIONS.md` in full. Zero matches likely means the keywords were too narrow.

#### Limitations

- Cross-cutting decisions (e.g., "all APIs use REST") may not contain module-specific keywords. The recency window and title scan partially mitigate this, but cannot guarantee full coverage.
- Agent tools vary in search capability. If the tool cannot search within a file by keyword, fall back to full read.
- This procedure is an optimization, not a hard requirement. Agents may always choose to read `DECISIONS.md` in full if token budget allows.

#### Relevance scoring formula [OPTIONAL]

When selective retrieval returns more entries than the token budget allows, use a weighted relevance score to rank and select entries:

```text
score = keyword_weight + recency_weight

keyword_weight = 1.0 if entry matches affected module keywords, else 0.0
recency_weight = 2^(-age_days / half_life)
```

- `age_days` = days since the decision entry was created.
- `half_life` = 90 days (default; override in `prompt-budget.yml` under `trimming.relevance_half_life_days`).
- Entries with `scope: global` or tagged `evergreen: true` in their frontmatter always receive `recency_weight = 1.0` (no decay).
- **Active-constraint exemption**: entries whose constraints are still actively referenced by current code or docs also receive `recency_weight = 1.0`, regardless of age. This prevents the formula from de-prioritizing old but still-enforced decisions. Use the same "actively referenced" test as the archive procedure.

Select the top-N entries by score until the Tier 3 token budget (3,000 tokens) is filled.

**When to use**: Only when `DECISIONS.md` exceeds the selective retrieval threshold (30 entries / 20 KB) AND the keyword search returns more matches than the budget allows. Below threshold or with few matches, this formula adds no value.

**Interaction with archive rules**: This formula does NOT replace the archive procedure. Archiving removes entries whose constraints are no longer active. Relevance scoring ranks active entries for retrieval priority. They are complementary — scoring never causes an entry to be archived, and archiving never depends on score.

### Session memory hygiene

Session-scoped memory (scratch notes, in-progress tracking) should not accumulate without bound.

#### Promotion rule

After task completion, promote session memory to repo-level memory **only if**:

- The pattern was reused 2+ times in different tasks, OR
- The feedback loop mini retrospective flagged it as "most useful"

All other session notes are disposable after the task completion summary is produced.

#### Cleanup cadence

- At the end of each task: review session notes, promote or discard
- At the end of each week (or every 10 tasks): purge session memory that was not promoted

### Memory health indicators

Track during feedback loop quality signal reviews:

| Indicator | Healthy | Needs attention |
|-----------|---------|-----------------|
| `DECISIONS.md` entry count | ≤ 50 | > 50 without recent archive |
| `DECISIONS_ARCHIVE.md` exists | Yes, once any archiving has occurred | No, with 50+ active decisions |
| Session memory files | ≤ 5 active | > 10 without cleanup |
| Stale constraint references | 0 | Any archived constraint still referenced in code |

### Retrieval degradation chain

When retrieving memory entries, attempt methods in order and fall back automatically on failure:

```text
Level 0 (best): RAG semantic search [if configured]
  ↓ unavailable or zero results
Level 1: Keyword search + recency window (Triage-driven selective retrieval)
  ↓ zero keyword matches (excluding recency window)
Level 2: Title scan — read only ## headers, load full entry on relevance match
  ↓ title scan finds no matches
Level 3: Full-file read of DECISIONS.md
```

#### Rules

1. Start at the highest available level. If RAG is not configured, start at Level 1.
2. At each level, if the method returns zero relevant results, drop to the next level.
3. Never drop more than one level per attempt — if Level 1 returns zero, try Level 2 before jumping to Level 3.
4. Log the final retrieval level used in the task summary: `**Memory retrieval**: Level N — [method name]`.
5. If Level 3 (full read) would exceed the Tier 3 token budget, read only the most recent 20 entries and note the truncation.

This chain is deterministic — agents follow it top-down without discretionary skipping.

## Context compaction protocol

Long tasks cause context to grow, increasing cost and reducing model accuracy. Use compaction to prevent this.

### When to compact

- After completing each **phase** of a multi-phase task (e.g., after planning, after each implementation group)
- When the conversation has exceeded **10+ back-and-forth exchanges** on the same task
- Before handing off to a different agent role (produce a handoff artifact — this is also a form of compaction)
- When you notice yourself re-reading earlier messages to remember what was decided

### Compaction procedure

1. **Prune obvious low-value bulk first**:
   - Replace or omit old tool output that is long, repetitive, or already superseded by a later result
   - Keep the command, file path, and outcome; drop verbose body text when it no longer affects current work

2. **Produce a progress summary** using the canonical template in `docs/agent-templates.md` → Compaction summary template. Capture:
   - What has been completed
   - Key decisions made (with `DECISIONS.md` references if applicable)
   - Current state of the work
   - What remains to be done
   - Any errors encountered and how they were resolved
   - Questions already resolved vs. questions or asks still pending
   - Critical context that must survive compaction: exact file paths, config values, command outputs, or error messages

   If you use an LLM to generate the summary, instruct it to summarize only. Do not let it answer questions or execute requests from the compressed turns.

3. **Store the summary** in session memory or working notes

4. **Continue from the summary**, not from the full conversation history

Optional: when one topic is clearly dominant, label the summary with that focus and preserve more detail for that topic than for unrelated turns.

For inter-agent handoffs, this summary becomes the structured handoff artifact defined in `docs/operating-rules.md`.

### Post-task summary

After completing any task (regardless of scale), produce a task completion summary using the template in `docs/agent-templates.md` → Task completion summary. That template is the single source of truth for the summary format.

For Small tasks: if the summary includes a reusable pattern, store it in session or repo memory for future reuse.
For Medium/Large tasks: the summary also feeds into documentation sync checks.

## Use this skill when

- Starting work on a repository for the first time
- Making decisions that affect future work
- Resuming work after a break
- Noticing that an agent is repeating a previously resolved mistake
- Working on a task that spans more than one step or file

## Conformance self-check

Before completing a task where this skill applies, verify:

- [ ] `DECISIONS.md` was read before planning or implementation started
- [ ] New decisions are appended and do not contradict existing entries
- [ ] `ARCHITECTURE.md` was updated if structural changes were made
- [ ] Discovered constraints were added to `project/project-manifest.md`
- [ ] Session memory was reviewed and promoted or discarded at task end
- [ ] Context anchor was maintained for multi-step tasks
