---
name: pdf-parse-with-llm
description: Parse PDFs (including scanned/image PDFs) using the `llm` CLI with an LLM attachment workflow; extract structured data and/or text when native PDF text is unreliable.
compatibility: opencode
metadata:
  input: pdf
  tool: llm
  default_model: gemini/gemini-2.5-pro
---

## What I do
- When a PDF needs parsing—especially scanned PDFs or PDFs where text extraction is unreliable—I use the `llm` CLI with the PDF attached.
- I craft prompts to extract:
  - clean plain text
  - structured JSON (tables, fields, invoices, forms)
  - specific answers with citations to page numbers (when feasible)
- I iterate: first get a high-level extraction, then refine prompts for missing fields, tables, or ambiguous sections.

## When to use me
Use this skill when:
- You encounter a `.pdf` that is likely scanned (image-based) or has broken/non-native text.
- You need reliable extraction into text/JSON for downstream processing.
- You need to interpret tables, forms, or multi-column layouts.

Do **not** use this skill when:
- You already have high-quality extracted text and only need summarization (unless verification against the PDF is required).

## Required tool
The `llm` utility is available for use.

## Command template (required)
Use this exact structure:

```bash
llm -m {{llm-id}} "{{prompt}}" -a "{{path-to-pdf}}"
```

### Attachment guidance
- Attach up to five files at a time; use this only for images and small PDFs.
- For large files or long documents, attach one file per request.

### Default model
Unless instructed otherwise, use `gemini/gemini-2.5-pro` by default.

## Prompting patterns
### 1) Fast full-text extraction
Use when you just need readable text.

Prompt example:
- "Extract all readable text from this PDF. Preserve headings and paragraph breaks. If the PDF is scanned, perform OCR. Return plain text."

Command:
```bash
llm -m gemini/gemini-2.5-pro "Extract all readable text from this PDF. Preserve headings and paragraph breaks. If the PDF is scanned, perform OCR. Return plain text." -a "PATH/TO/FILE.pdf"
```

### 2) Structured JSON extraction (preferred for automation)
Use when you need schema’d output.

Prompt example:
- "Extract the following fields as JSON: invoice_number, invoice_date, vendor_name, vendor_address, customer_name, customer_address, line_items[{description, quantity, unit_price, amount}], subtotal, tax, total. If a field is missing, use null. Also include page_number for each extracted top-level field if possible."

Command:
```bash
llm -m gemini/gemini-2.5-pro "Extract the following fields as JSON: invoice_number, invoice_date, vendor_name, vendor_address, customer_name, customer_address, line_items[{description, quantity, unit_price, amount}], subtotal, tax, total. If a field is missing, use null. Also include page_number for each extracted top-level field if possible." -a "PATH/TO/FILE.pdf"
```

### 3) Table extraction
Prompt example:
- "Locate the table(s) in this PDF and return them as JSON arrays. Preserve column headers. If multiple tables exist, return an array of tables with {title, page_number, rows}."

### 4) Targeted Q&A with page grounding
Prompt example:
- "Answer: What is the contract termination notice period? Quote the exact clause and provide the page number."

## Workflow guidance
1. Identify the user’s objective (full text vs structured fields vs tables vs Q&A).
2. Run an initial extraction prompt.
3. If the output is incomplete/garbled:
   - narrow to specific pages/sections in the prompt
   - request alternative renderings (e.g., “focus on the header area of each page”)
   - extract tables separately from narrative text
4. Return results in the format the user asked for (plain text / JSON / CSV-like text), and note uncertainties explicitly.
