---
name: geo-fix-llmstxt
description: Generate llms.txt and llms-full.txt files for a website to improve AI discoverability. Use when the user asks to create llms.txt, generate llms.txt, fix llms.txt, make site AI-readable, or mentions llms.txt generation.
version: 1.0.0
---

# geo-fix-llmstxt Skill

You generate specification-compliant `llms.txt` and `llms-full.txt` files that help AI systems understand and cite a website's content. The output follows the [llmstxt.org](https://llmstxt.org/) proposed standard.

Refer to `references/llmstxt-spec.md` in this skill's directory for the full specification reference.

---

## Phase 1: Discovery

### 1.1 Validate Input

Extract the target URL from the user's input. Normalize it:
- Add `https://` if no protocol specified
- Remove trailing slashes
- Extract the base domain

### 1.2 Check Existing llms.txt

Fetch these URLs to check if llms.txt already exists:

```
{url}/llms.txt
{url}/.well-known/llms.txt
```

If found:
- Parse and analyze the existing file
- Identify gaps (missing sections, broken links, incomplete descriptions)
- Proceed to Phase 4 (Improvement Mode) instead of generating from scratch

If not found:
- Proceed to Phase 2 (Full Generation)

### 1.3 Fetch Homepage

Fetch the homepage to extract:
- Site name (from `<title>`, `<meta property="og:site_name">`, or `<h1>`)
- Site description (from `<meta name="description">` or `<meta property="og:description">`)
- Primary navigation links
- Footer links
- Logo alt text

### 1.4 Fetch Sitemap

Try these locations in order:
1. `{url}/sitemap.xml`
2. `{url}/sitemap_index.xml`
3. Parse `{url}/robots.txt` for `Sitemap:` directive

From the sitemap, build a categorized page inventory:
- Documentation / Help pages
- Blog / Content pages
- Product / Service pages
- API reference pages
- About / Team pages
- Legal pages (privacy, terms)
- Contact page

### 1.5 Fetch Key Pages

Fetch up to 15 key pages from the inventory to extract:
- Page title
- Meta description
- H1 heading
- First paragraph (for content summary)
- Content type (article, product, docs, etc.)

**Rate limiting**: Wait 1 second between requests to the same domain.

---

## Phase 2: Content Analysis

### 2.1 Identify Site Identity

From the collected data, determine:

| Field | Source Priority |
|-------|---------------|
| Site name | og:site_name > title tag > H1 > domain |
| Summary | meta description > og:description > first paragraph |
| Primary purpose | Navigation structure + content analysis |
| Key topics | H1/H2 headings across pages, meta keywords |

### 2.2 Categorize Pages

Group pages into llms.txt sections. Use these default categories, but adapt based on actual site structure:

| Category | H2 Section Name | Content Types |
|----------|----------------|---------------|
| Documentation | `## Docs` | Help articles, guides, tutorials, API docs |
| Blog / Articles | `## Blog` | Blog posts, news, case studies |
| Products / Services | `## Products` or `## Services` | Product pages, pricing, features |
| API | `## API` | API reference, endpoints, SDKs |
| Company | `## About` | About, team, careers, press |
| Legal | `## Legal` | Privacy policy, terms, cookies |

**Rules:**
- Only include categories with 2+ pages (unless critical like Docs or API)
- Order sections by importance to AI understanding
- Merge small categories into a logical parent

### 2.3 Write Page Descriptions

For each page entry, write a concise description (under 100 characters) that:
- Explains what the page covers (not just its title)
- Uses factual, specific language
- Avoids marketing fluff
- Includes key entities or topics

Good: `Core REST API endpoints for user management and authentication`
Bad: `Our amazing API documentation`

### 2.4 Determine Optional Content

Mark sections as `## Optional` if they are:
- Legal pages (privacy, terms)
- Older blog posts (>12 months)
- Supplementary content not critical for understanding the site

---

## Phase 3: Generate Files

### 3.1 Generate llms.txt

Create the file following this structure strictly:

```markdown
# {Site Name}

> {One-paragraph summary: what the site/company does, who it serves, key offerings. 2-4 sentences. Factual and specific.}

{Optional additional context paragraph: technology stack, industry, scale, notable achievements. Only if genuinely useful for AI understanding.}

## Docs
- [{Page Title}]({URL}): {Concise description}
- [{Page Title}]({URL}): {Concise description}

## API
- [{Page Title}]({URL}): {Concise description}

## Blog
- [{Page Title}]({URL}): {Concise description}

## About
- [{Page Title}]({URL}): {Concise description}

## Optional
- [{Page Title}]({URL}): {Concise description}
```

**Format rules:**
- H1: Site name only (required)
- Blockquote: Summary paragraph (strongly recommended)
- H2: Section headers for link groups
- Links: `- [Title](URL): Description` format
- No H3 or deeper headings
- No images or HTML
- Pure Markdown only

### 3.2 Generate llms-full.txt

Create an expanded version that includes actual page content:

```markdown
# {Site Name}

> {Same summary as llms.txt}

{Same additional context as llms.txt}

## Docs

### {Page Title}
{URL}

{Full page content converted to clean Markdown: headings, paragraphs, lists, code blocks. Strip navigation, footers, ads, sidebars. Keep only main content.}

---

### {Page Title}
{URL}

{Full page content...}

---

## Blog

### {Article Title}
{URL}

{Full article content...}
```

**Content cleaning rules:**
- Strip all navigation, headers, footers, sidebars
- Remove ads, cookie banners, promotional CTAs
- Preserve headings, lists, tables, code blocks
- Convert relative URLs to absolute
- Keep author bylines and publication dates
- Maximum 50 pages in llms-full.txt (prioritize by importance)

### 3.3 Write Files

Create two files in the current working directory:
- `llms.txt`
- `llms-full.txt`

---

## Phase 4: Improvement Mode

If an existing llms.txt was found in Phase 1.2, analyze and improve it:

### 4.1 Validate Structure

Check against the spec:
- [ ] Has H1 with site name
- [ ] Has blockquote summary
- [ ] H2 sections with link lists
- [ ] Links use `[Title](URL): Description` format
- [ ] No broken links (fetch each to verify)
- [ ] No H3+ headings (spec violation)
- [ ] Pure Markdown (no HTML)

### 4.2 Content Gap Analysis

Compare existing llms.txt against the site's actual content:
- Missing important pages (docs, API, key products)
- Outdated links (404s, redirects)
- Missing descriptions on links
- Categories that should be added
- Summary that could be more specific

### 4.3 Generate Improved Version

Create `llms.txt.improved` with:
- All fixes applied
- New pages added
- Descriptions enhanced
- Structure optimized

Print a diff summary showing what changed and why.

---

## Output Summary

After generating, print:

```
llms.txt generated for {domain}

Files created:
  llms.txt          — {line_count} lines, {section_count} sections, {link_count} links
  llms-full.txt     — {line_count} lines, {page_count} pages included

Sections:
  {section_name}: {link_count} links
  {section_name}: {link_count} links
  ...

Installation:
  Place both files at your domain root:
  - https://{domain}/llms.txt
  - https://{domain}/llms-full.txt

  Or at the well-known path:
  - https://{domain}/.well-known/llms.txt

  Add to robots.txt (optional):
  Sitemap: https://{domain}/llms.txt
```

---

## Quality Gates

1. **Link limit**: Maximum 100 links in llms.txt, 50 pages in llms-full.txt
2. **Description length**: Each link description under 100 characters
3. **Summary length**: Blockquote summary 2-4 sentences
4. **No broken links**: Verify all URLs return 200
5. **Rate limiting**: 1 second between requests to the same domain
6. **Timeout**: 30 seconds per URL fetch
7. **Respect robots.txt**: Do not fetch pages blocked by robots.txt
