---
title: "Generate LLM fine-tuning, RAG, and eval datasets from source material with easy-dataset"
description: "Turn raw documents into structured fine-tuning, RAG, and evaluation datasets when the real job is dataset preparation, not generic document parsing."
verification: "listed"
source: "https://github.com/ConardLi/easy-dataset"
author: "ConardLi"
publisher_type: "GitHub repository"
category:
  - "Data Extraction & Transformation"
framework:
  - "Multi-Framework"
tool_ecosystem:
  github_repo: "ConardLi/easy-dataset"
  github_stars: 14000
---

# Generate LLM fine-tuning, RAG, and eval datasets from source material with easy-dataset

Turn raw documents into structured fine-tuning, RAG, and evaluation datasets when the real job is dataset preparation, not generic document parsing.

## Prerequisites

easy-dataset application, supported source documents such as PDF/Markdown/DOCX/TXT/EPUB, and an operator or agent preparing datasets

## Installation

Choose whichever fits your setup:

1. Copy this skill folder into your local skills directory.
2. Clone the repo and symlink or copy the skill into your agent workspace.
3. Add the repo as a git submodule if you manage shared skills centrally.
4. Install it through your internal provisioning or packaging workflow.
5. Download the folder directly from GitHub and place it in your skills collection.

Install command or upstream instructions:

```
Install or run easy-dataset from the upstream GitHub project, then load source documents and use its dataset-building flows to generate fine-tuning, RAG, or evaluation datasets.
```

## Documentation

- https://github.com/ConardLi/easy-dataset#readme

## Source

- [Agent Skill Exchange](https://agentskillexchange.com/skills/generate-llm-fine-tuning-rag-and-eval-datasets-from-source-material-with-easy-dataset/)
