---
title: "Extract structured text, metadata, tables, and images from mixed documents through an MCP server with Kreuzberg"
description: "Expose one document-extraction surface to MCP-compatible agents so they can normalize PDFs, Office files, images, HTML, and other mixed inputs before downstream review or indexing."
verification: "listed"
source: "https://github.com/kreuzberg-dev/kreuzberg"
author: "kreuzberg-dev"
publisher_type: "organization"
category:
  - "Data Extraction & Transformation"
framework:
  - "MCP"
tool_ecosystem:
  github_repo: "kreuzberg-dev/kreuzberg"
  github_stars: 7630
---

# Extract structured text, metadata, tables, and images from mixed documents through an MCP server with Kreuzberg

Expose one document-extraction surface to MCP-compatible agents so they can normalize PDFs, Office files, images, HTML, and other mixed inputs before downstream review or indexing.

## Prerequisites

Kreuzberg install or container image, document files to process, MCP-compatible client

## Installation

Choose whichever fits your setup:

1. Copy this skill folder into your local skills directory.
2. Clone the repo and symlink or copy the skill into your agent workspace.
3. Add the repo as a git submodule if you manage shared skills centrally.
4. Install it through your internal provisioning or packaging workflow.
5. Download the folder directly from GitHub and place it in your skills collection.

Install command or upstream instructions:

```
Follow the upstream installation guide for the CLI or container, then run Kreuzberg in its documented MCP server mode and attach that server to your MCP-compatible client before sending mixed document inputs for extraction.
```

## Documentation

- https://github.com/kreuzberg-dev/kreuzberg#readme

## Source

- [Agent Skill Exchange](https://agentskillexchange.com/skills/extract-structured-text-metadata-tables-and-images-from-mixed-documents-through-an-mcp-server-with-kreuzberg/)