Extract text, tables, metadata, and images from 91+ document formats (PDF, Office, images, HTML, email, archives, academic) using Kreuzberg.
Expose one document-extraction surface to MCP-compatible agents so they can normalize PDFs, Office files, images, HTML, and other mixed inputs before downstream review or indexing.
Generate fully-typed polyglot language bindings for Rust libraries using Alef. Use when configuring alef.toml, running alef CLI commands, writing e2e test fixtures, debugging…
Plugin architecture, registration, and trait patterns
Document extraction pipeline architecture and patterns
REST API server and MCP protocol integration
Chunking, embeddings, and RAG pipeline integration