---
name: pageindex-rag
description: Replace semantic/vector RAG with PageIndex — a vectorless, reasoning-based retrieval system. Use when migrating from embeddings/vector stores to LLM-powered hierarchical indexing, or when explaining how to build RAG without similarity search.
when_to_use: |
  - User asks how to replace vector RAG or eliminate vector stores
  - User wants to build RAG without embeddings
  - User mentions chunking problems or wants better retrieval accuracy
  - User asks about LLM-based document indexing or tree-based retrieval
user-invocable: true
disable-model-invocation: false
arguments:
  - language
  - framework
---

# PageIndex RAG Migration Guide

## Overview

PageIndex replaces the entire embed → store → similarity-search pipeline with three LLM-driven steps:

1. **Parse** (offline): LLM splits documents into semantic tree structure
2. **Index** (offline): LLM summarizes every node bottom-up  
3. **Retrieve** (query time): LLM navigates tree top-down using summaries

**Eliminates:** Vector stores, embedding models, similarity search, chunking libraries

**Replaces with:** Hierarchical tree parsing + recursive summarization + reasoning-based navigation

---

## Critical Rules

1. **Parse and index run ONCE offline per document** — never at query time
2. **Always persist the indexed tree** to JSON/disk after building
3. **Tree depth is configurable** — tune `max_depth` to document size (see tuning table)
4. **Only leaf nodes hold raw content** — all other nodes hold only title + LLM summary
5. **Retrieval is O(max_depth) LLM calls** — 2-4 cheap calls per query
6. **No vector math anywhere** — pure LLM reasoning

---

## Architecture

### Data Structure: PageNode

```$0
interface PageNode {
  title: string;
  content: string;    // raw text (ONLY at leaves)
  summary: string;    // LLM-generated (ALL nodes)
  depth: number;      // 0=root, 1=section, 2+=subsection
  children: PageNode[];
  parent?: PageNode;  // optional, not serialized
}
```

### Pipeline

```
INGEST (once, offline):
  raw_text
    → _segment(text)           # LLM splits into titled sections
    → _parse_recursive(...)    # recursively splits large sections
    → build_summaries(root)    # LLM summarizes bottom-up
    → save(root, doc_id)       # persist as JSON

QUERY (every request, cheap):
  query
    → load(doc_id)             # deserialize from JSON
    → retrieve(query, root)    # LLM navigates tree, returns leaf content
    → [your LLM generation]    # unchanged from existing RAG
```

---

## Migration Steps

### 1. Find All Vector RAG Code

Locate where your codebase:
- Splits text into chunks (RecursiveCharacterTextSplitter, TokenTextSplitter, etc.)
- Creates embeddings (OpenAIEmbeddings, sentence-transformers, embed_documents)
- Writes to vector store (add_texts, upsert, add_documents)
- Queries vector store (similarity_search, query, as_retriever)

### 2. Replace Ingestion

**OLD:**
```$0
chunks = text_splitter.split_text(raw_text)
embeddings = embedding_model.embed_documents(chunks)
vector_store.add_texts(chunks, embeddings)
```

**NEW:**
```$0
root = parse_document(raw_text, max_depth=3)
build_summaries(root)
save(root, f"{doc_id}.json")
```

### 3. Replace Retrieval

**OLD:**
```$0
results = vector_store.similarity_search(query, k=5)
context = "\n".join([r.page_content for r in results])
```

**NEW:**
```$0
root = load(f"{doc_id}.json")
context = retrieve(query, root)
```

### 4. Keep Generation Unchanged

The `context` string returned by `retrieve()` is a drop-in replacement for concatenated chunks.

### 5. Remove Dead Dependencies

- Vector store clients (chromadb, pinecone, weaviate, qdrant)
- Embedding model imports
- Chunking utilities

---

## Tuning Parameters

| Parameter | Default | Effect |
|-----------|---------|--------|
| `max_depth` | 2 | Tree depth; increase for larger documents |
| `SUBSECTION_THRESHOLD` | 300 words | Split threshold for recursive parsing |
| `LEAF_MIN_WORDS` | 50 words | Never split below this size |
| `_segment() cap` | 8000 chars | Input limit for segmentation LLM call |
| `_summarize() cap` | 3000 chars | Input limit for summarization LLM call |

### Document Size → max_depth Guide

| Document Size | Recommended `max_depth` |
|---------------|-------------------------|
| < 2,000 words | 2 |
| 2,000–10,000 words | 3 |
| 10,000–50,000 words | 4 |
| > 50,000 words | 4+ with chunked root ingestion |

---

## Implementation

See full Python and TypeScript implementations with all 6 components:
- `_segment()` — LLM-based text splitting
- `_parse_recursive()` — recursive tree building
- `parse_document()` — entry point
- `build_summaries()` — bottom-up summarization
- `save()/load()` — JSON persistence
- `retrieve()` — query-time navigation

Reference: https://github.com/vixhal-baraiya/pageindex-rag

---

## Common Patterns

### Pattern: Multi-document system

```$0
# Ingest
for doc in documents:
    root = parse_document(doc.text, max_depth=tune_to_size(doc))
    build_summaries(root)
    save(root, f"indexes/{doc.id}.json")

# Query
roots = [load(f"indexes/{id}.json") for id in relevant_doc_ids]
contexts = [retrieve(query, root) for root in roots]
final_context = "\n\n".join(contexts)
```

### Pattern: In-memory caching

```$0
cache = {}

def get_or_build(doc_id, raw_text):
    if doc_id in cache:
        return cache[doc_id]
    
    path = f"indexes/{doc_id}.json"
    if os.path.exists(path):
        root = load(path)
    else:
        root = parse_document(raw_text)
        build_summaries(root)
        save(root, path)
    
    cache[doc_id] = root
    return root
```

### Pattern: Invalidation on update

```$0
def update_document(doc_id, new_text):
    os.remove(f"indexes/{doc_id}.json")
    cache.pop(doc_id, None)
    root = parse_document(new_text)
    build_summaries(root)
    save(root, f"indexes/{doc_id}.json")
```

---

## Key Differences from Vector RAG

| Aspect | Vector RAG | PageIndex |
|--------|-----------|----------|
| Indexing | Embed chunks → store vectors | Parse tree → summarize nodes |
| Storage | Vector database | JSON files |
| Retrieval | Cosine similarity | LLM reasoning |
| Context selection | Top-K nearest neighbors | Single best leaf via tree walk |
| Cost at ingest | Embedding API calls | 2N LLM calls for N leaves |
| Cost at query | Embedding + DB query | 2-4 cheap LLM calls (max_tokens=5) |
| Latency | ~50-200ms | ~500-1000ms (2-4 serial LLM calls) |
| Accuracy | Depends on embedding quality | Depends on summary quality |

---

## Troubleshooting

### "Tree is too shallow, retrieval misses content"
→ Increase `max_depth` (e.g., 2 → 3 for longer docs)

### "Indexing is too expensive (too many LLM calls)"
→ Decrease `max_depth` or increase `SUBSECTION_THRESHOLD`

### "Retrieval picks wrong section"
→ Improve node summaries by adjusting `_summarize()` prompt  
→ Check that `description` in frontmatter guides Claude when to use this skill

### "Tree is too deep/complex"
→ Decrease `max_depth`  
→ Increase `SUBSECTION_THRESHOLD` to reduce splitting

---

## Installation

Save this SKILL.md to:

| Tool | Path |
|------|------|
| Claude Code | `~/.claude/skills/pageindex-rag/SKILL.md` (personal)<br>`.claude/skills/pageindex-rag/SKILL.md` (project) |
| Kilo Code | `~/.kilo/rules/pageindex-rag.md` or `.kilo/rules/pageindex-rag.md` |
| Cursor | `.cursor/rules/pageindex-rag.mdc` |
| Cline | `.clinerules/pageindex-rag.md` |
| Roo Code | `.roo/rules-code/pageindex-rag.md` |

Invoke with `/pageindex-rag` or let Claude load automatically when discussing RAG migration.

---

## Arguments

This skill accepts optional arguments for language-specific examples:

```
/pageindex-rag python fastapi
/pageindex-rag typescript express
/pageindex-rag go
```

When invoked with arguments, examples will be tailored to $1 (language) and $2 (framework if provided).