---
title: "PaddleOCR Multilingual Document OCR and Structured Data Toolkit"
description: "PaddleOCR is a powerful, lightweight OCR toolkit developed by Baidu that converts documents and images into structured, AI-friendly data like JSON and Markdown. It supports 100+ languages with industry-leading accuracy, bridging the gap between images/PDFs and LLMs."
verification: "security_reviewed"
source: "https://github.com/PaddlePaddle/PaddleOCR"
category:
  - "Data Extraction & Transformation"
framework:
  - "Multi-Framework"
tool_ecosystem:
  github_repo: "paddlepaddle/paddleocr"
  github_stars: 73714
---

# PaddleOCR Multilingual Document OCR and Structured Data Toolkit

PaddleOCR is a powerful, lightweight OCR toolkit developed by Baidu that converts documents and images into structured, AI-friendly data like JSON and Markdown. It supports 100+ languages with industry-leading accuracy, bridging the gap between images/PDFs and LLMs.

## Installation

Choose whichever fits your setup:

1. Copy this skill folder into your local skills directory.
2. Clone the repo and symlink or copy the skill into your agent workspace.
3. Add the repo as a git submodule if you manage shared skills centrally.
4. Install it through your internal provisioning or packaging workflow.
5. Download the folder directly from GitHub and place it in your skills collection.

## Source

- [Agent Skill Exchange](https://agentskillexchange.com/skills/paddleocr-multilingual-document-ocr-toolkit/)