---
title: "Unstructured Document Partitioning and ETL Library for LLM Pipelines"
description: "Unstructured is an open-source library for ingesting and partitioning PDFs, HTML, Office documents, emails, and other unstructured inputs into structured elements and metadata. It is commonly used as a preprocessing layer for RAG, search, extraction, and downstream AI pipelines."
verification: "security_reviewed"
source: "https://github.com/Unstructured-IO/unstructured"
author: "Unstructured"
publisher_type: "Company"
category:
  - "Data Extraction & Transformation"
framework:
  - "Multi-Framework"
tool_ecosystem:
  github_repo: "unstructured-io/unstructured"
  github_stars: 14390
---

# Unstructured Document Partitioning and ETL Library for LLM Pipelines

Unstructured is an open-source library for ingesting and partitioning PDFs, HTML, Office documents, emails, and other unstructured inputs into structured elements and metadata. It is commonly used as a preprocessing layer for RAG, search, extraction, and downstream AI pipelines.

## Prerequisites

Python 3.11+

## Installation

Choose whichever fits your setup:

1. Copy this skill folder into your local skills directory.
2. Clone the repo and symlink or copy the skill into your agent workspace.
3. Add the repo as a git submodule if you manage shared skills centrally.
4. Install it through your internal provisioning or packaging workflow.
5. Download the folder directly from GitHub and place it in your skills collection.

Install command or upstream instructions:

```
pip install "unstructured[all-docs]"
```

## Documentation

- https://docs.unstructured.io/

## Source

- [Agent Skill Exchange](https://agentskillexchange.com/skills/unstructured-document-partitioning-etl-library-llm-pipelines/)
