---
name: book-library-analyzer
version: "1.1"
description: >
  Analyze a folder of books and audiobooks: scan it, identify every book (author + title),
  enrich each with web data (year, genre, tags, short description, Goodreads rating),
  save a reusable catalog file, and help the user pick what to read or listen to next.
  TRIGGER IMMEDIATELY when the user points at a folder of books or audiobooks and wants
  any of: a list of what's in it, ratings, descriptions, a catalog, or a recommendation —
  e.g. "analyze my books folder", "what books do I have in D:\Books", "scan my audiobooks",
  "make a catalog of my library", "help me pick something to read/listen to from my collection",
  "what's worth reading in this folder". Also trigger when the user asks to choose a book
  from a folder even if they don't say "analyze".
allowed-tools:
  - WebSearch
  - Bash
  - Write
  - Glob
  - Read
# allowed-tools is advisory: lists tools this skill needs; Claude will still request
# user approval if the session's permission mode requires it.
---

# Book Library Analyzer

Turn a messy folder of ebooks and audiobooks into a clean, enriched catalog — then act as a
librarian and help the user choose their next book.

**Language rule:** Respond and write the catalog in the language the user is communicating in.
Book titles stay in their original language, with a translation in parentheses when helpful.

---

## Step 0 — Get the folder and check for an existing catalog

If the user already gave a path, use it. Otherwise ask for the path to the folder with books
or audiobooks. Verify the folder exists before proceeding; if it doesn't, show what you tried
and ask for a correction — don't guess sibling paths silently.

**Before scanning**, check whether `library-catalog.md` already exists in the folder (or at
`--output` path if specified). If it does, tell the user the catalog date and book count from
its header, then ask:

> "Found an existing catalog from [date] with N books. What would you like to do?"
> 1. **Reuse** — jump straight to recommendations (Step 5)
> 2. **Update** — add only new files, keep existing entries intact
> 3. **Rescan** — rebuild catalog from scratch

If the user says "update", proceed to Step 1 and note the set of filenames already in the
catalog — you'll use this in Step 2 to skip already-catalogued books.

---

## Step 1 — Scan the folder

Locate the bundled scanner script. It ships in the `scripts/` subfolder alongside this
SKILL.md. Use Glob to find it:

```
**/book-library-analyzer/scripts/scan_library.py
```

Search from the home directory. Take the first match. If Glob returns nothing, the plugin
was not installed correctly — tell the user and fall back to manual directory listing.

Once located, run the scanner (stdlib-only Python, no dependencies to install):

```
python "<script-path>" "<folder>"
```

(On Windows, `python` may not be on PATH — try `py` before giving up.)

The script walks the tree and prints JSON: one entry per book candidate with the
filename-based author/title guess, embedded metadata when available (EPUB/FB2/PDF metadata,
MP3 ID3 tags, FLAC/OGG Vorbis comments, MP4 tags), format, size, and file count (a folder
of MP3s is grouped as a single audiobook).

If Python is unavailable or the script fails, fall back to listing the directory yourself
(Glob/PowerShell) and parsing names manually — the scan must not be a hard blocker.

---

## Step 2 — Build the book list

From the scan output, produce the definitive list of books:

**Identify:** For each entry reconcile the filename guess with embedded metadata —
metadata usually wins for author/title, but watch for junk metadata (e.g. "Unknown Artist",
scene-release tags, the narrator listed as the artist on audiobooks). Use your own knowledge
of literature: if a guess looks like a real well-known book with the fields swapped
("Мастер и Маргарита - Булгаков"), fix the order.

Mark anything you couldn't confidently identify as `unidentified` and show it to the user in
the next message so they can correct it — one wrong identification poisons everything
downstream.

**Detect duplicates:** Group entries that share the same Author + Title but differ only in
format (e.g. the same book in `.epub` and `.fb2`). Present them as a single book with
multiple formats listed. Don't count them separately in totals.

**Detect series:** Look for patterns that indicate a book belongs to a series:
- Explicit numbering: "Book 1", "Part 2", "Vol. 3", "Том 1", "#4", "Episode 5"
- Known series suffixes in the title or filename
When detected, extract the series name and number, and mark the entry with
`"series": "Series Name"` and `"series_num": N`. You'll use this in Step 4 to group the
catalog.

**Update mode only:** Skip files whose relative path already appears in the existing catalog.
Only new files go through Steps 3–4.

Show the user the raw list (Author — Title, format) **before** doing web research, with a
note like "scanning the web for details now". This gives them a chance to spot
misidentifications early, and they see progress instead of silence.

---

## Step 3 — Enrich from the web

For each identified book, use WebSearch to find:

- **Year** of first publication
- **Genre** (1–2 main genres)
- **Tags / themes** (3–6 short tags: e.g. "dystopia", "coming-of-age", "unreliable narrator")
- **Short description** — 2–3 sentences, spoiler-free, in the user's language
- **Goodreads rating** — the average rating (e.g. 4.12) and ideally the ratings count.
  Goodreads is the preferred rating source. If a book genuinely isn't on Goodreads
  or has too few ratings to be meaningful, try OpenLibrary or LibraryThing as fallback —
  mark the source explicitly (e.g. "4.1 (OpenLibrary)"). Do not invent a number; a real
  "not found" is more useful than a fake 4.2.

Search efficiently: one search per book is usually enough
(`"<title>" <author> goodreads rating`), and a Goodreads result snippet typically contains
the rating, year, and genre in one shot. Don't fabricate from memory alone — your training
data may predate editions or confuse similarly-titled books; verify against search results.
For your own confident knowledge (a classic's year and genre), a single confirming search
is fine.

**Large libraries:** if there are more than ~30 books, tell the user the count and ask whether
to enrich everything or start with a subset (e.g. a genre, an author, or unread-looking ones).
Don't silently burn half an hour of searches on a 300-book folder.

---

## Step 4 — Write the catalog

**Catalog path:** By default, save `library-catalog.md` into the scanned folder. If the user
passed `--output <path>`, save there instead. If the folder is not writable (network share,
read-only drive), notify the user and ask for an alternative save location before writing.

**Update mode:** Merge new entries into the existing catalog rather than overwriting it.
Update the header (date, total count). Keep the sort order consistent.

**Structure:**

```markdown
# Library Catalog — <folder name>
_Generated <date> · N books (X ebooks, Y audiobooks) · updated <date if update mode>_

## Summary

| # | Author | Title | Year | Series | Genre | Goodreads | Format |
|---|--------|-------|------|--------|-------|-----------|--------|
| 1 | ...    | ...   | ...  | Book 2 | ...   | 4.12 (38k)| epub   |

## Books

### <Series Name> (series)

#### 1. <Author> — <Title> (<year>)
**Genre:** ... · **Tags:** tag1, tag2, tag3 · **Goodreads:** 4.12 (38,412 ratings)
**Format:** epub + fb2 · 2.1 MB · `relative/path/to/file`

<2–3 sentence description>

### Standalone Books

### <Author> — <Title> (<year>)
...

## Unidentified

- `path/to/weird_file.mp3` — best guess: ...
```

Group books in series under a named series heading, ordered by series number. All other
books go under "Standalone Books". Omit the Unidentified section if everything was
identified. For duplicates, list all formats on one **Format:** line.

After writing the file, give the user a short summary in chat: book count, a couple of
highlights (highest-rated, oddest find), and where the catalog was saved.

---

## Step 5 — Help the user choose

This is the payoff step — don't skip it and don't just dump the table again. Ask **one** short
question about what they're in the mood for (genre / heavy vs. light / short vs. long / fiction
vs. non-fiction — pick the axis that best splits *their actual collection*). Then recommend
**2–3 books** from the catalog, each with one sentence on *why this one, for them, now* —
grounded in the rating, tags, and description you found. If the user already stated a
preference earlier ("something light for a trip"), skip the question and recommend directly.
