---
name: generate-photo-captions
description: Write benefit-focused photo captions for a short-term rental listing. Triggers on "write photo captions for [property]", "caption these photos", "photo captions for [property]", "captions for the [property] photos", "write captions for my Airbnb photos", "describe these listing photos", "caption everything in this Drive folder", "photos are in this Drive folder", or any request to produce captions for STR listing photos. Also triggers on "and rename them", "rename the files too", "suggest filenames for the photos" - these turn on the optional Suggested Filename column in the xlsx. Accepts a Google Drive folder URL/ID (recommended for real work), actual photo files dropped in chat (vision), or text descriptions. NOT for writing full listing copy from scratch (use the str-listings plugin's update-listing skill for that).
---

# Generate STR listing photo captions

Produce benefit-focused, voice-matched captions for each photo in a short-term rental listing. Output is a numbered xlsx with Photo #, Filename, Area, Description, and Caption columns - ready to paste into OwnerRez, Airbnb, or any channel that supports bulk caption upload.

## When to use this skill

Use this skill when the user wants captions for **existing listing photos** of a vacation rental. The user might say:

- "Write photo captions for Cosmic Cowboy"
- "I just shot new photos at [property] - caption them"
- "Caption these [#] photos for the Airbnb listing"
- "Photo descriptions for the [property] listing"

Do NOT use this skill for:
- Full listing copy from scratch (use `str-listings:update-listing` for refreshes; for brand-new copy the user can request explicitly).
- Caption rewrites where the original captions already exist and the user just wants polish - handle that conversationally without invoking the full workflow.
- Non-STR contexts (real-estate sale listings, hotel sites, retail) - the voice rules are STR-specific.

## Workflow

Follow these steps in order. Each numbered step is a checkpoint with the user - don't skip ahead.

### Step 1: Identify the property + load the voice profile

Confirm which property the captions are for. Then look for a matching profile in `${CLAUDE_PLUGIN_ROOT}/skills/generate-photo-captions/references/profiles/`. Filenames are kebab-case slugs of the property name (e.g., `cosmic-cowboy.md`, `riverside-retreat.md`).

- **If a profile exists:** read it. Acknowledge in chat which profile you loaded ("Loading the Cosmic Cowboy voice profile: rustic-modern luxe, hero amenities are the cowboy pool and Solo Stove fire pit, signature phrase patterns include ..."). This grounds the user that you have the right context.
- **If no profile exists:** offer to capture one now by invoking the `add-voice-profile` skill. Don't proceed without a profile - captions written without one drift toward generic "luxury vacation rental" mush.

### Step 2: Read the caption-craft rules

Always read `${CLAUDE_PLUGIN_ROOT}/skills/generate-photo-captions/references/caption-craft.md` before writing any caption. This is the universal rules file: tone, length, structure, do/don't list. The voice profile from Step 1 layers on top of these rules - voice profile wins on stylistic conflicts.

### Step 3: Confirm input mode (one question at a time)

Ask **one** question, wait for the answer, then ask the next. Do not combine these into a single message - the user prefers focused, sequenced questions.

**First message - input mode:**

> How are you providing the photos?
>
> 1. **Google Drive folder URL or ID** - I'll enumerate the folder and process every image in it. (Recommended for real work; see Step 4a.)
> 2. **Photo files in chat** - drop the photos directly; I'll use vision on each.
> 3. **Text descriptions** - paste lines like `Photo N: [what's in the photo]`.
> 4. **Mix** - combine the above however makes sense for the batch.

Wait for the answer.

**Second message - cadence:**

> How do you want to work?
>
> 1. **Batched** - 5-10 at a time; you review captions, then continue. Better for tone calibration on the first run for a property.
> 2. **All-at-once** - process everything, get one complete xlsx. Faster. (Default when the user says "automatically process the photos" or anything similar.)

Wait for the answer.

**Third message - rename toggle:**

> One more: want me to also generate descriptive **suggested filenames** for each photo (e.g., `03-backyard-cowboy-pool.jpg`)? Default off - I leave Drive filenames as they are. Turn on if you want to bulk-rename in Drive afterward.
>
> 1. **Off** (default) - xlsx has 5 columns, original filenames stay.
> 2. **On** - I add a `Suggested Filename` column to the xlsx; team can paste-rename in Drive after review.

Wait for the answer before continuing.

The default for this toggle is OFF unless the user's original trigger explicitly asked for renames (phrases like "and rename them", "rename the files too", "rename based on what's in the photo").

### Step 4a: Drive folder ingest (only if the input mode is Google Drive)

If the user chose the Drive folder mode in Step 3, do the following BEFORE Step 4:

1. **Extract the folder ID from the URL.** If the user pasted a share URL like `https://drive.google.com/drive/folders/<ID>?usp=sharing`, grab everything between `/folders/` and the next `/` or `?`. If the user pasted only an ID, use it as-is.

2. **List images in the folder via the Google Drive MCP.** Call `mcp__e9508f4b-...__search_files` (or whichever search tool the Drive MCP exposes in this session) with a query that filters on the folder as parent and image MIME types. Concretely, build a Drive query string like:

   ```
   '<FOLDER_ID>' in parents and (mimeType = 'image/jpeg' or mimeType = 'image/png' or mimeType = 'image/heic' or mimeType = 'image/heif' or mimeType = 'image/webp' or mimeType = 'image/tiff') and trashed = false
   ```

   Capture each result's `id`, `name`, and `mimeType`. Sort by `name` ascending for a deterministic order.

3. **Report the count + first 5 filenames to the user and wait for "proceed".** Example:

   > Found 24 images in the folder. First few: `IMG_0001.jpg`, `IMG_0002.jpg`, `backyard_wide.jpg`, ... Ready to process all 24? (yes / change cadence / stop)

4. **Cadence for big folders.** If > 30 images and the user picked batched, work in batches of 8 with review checkpoints. If > 30 images and the user picked all-at-once, give one confirmation prompt ("I'll process all N and deliver the xlsx at the end") and proceed without per-batch checkpoints. Honor the cadence choice; the all-at-once path is what most users actually want when they say "automatically process the photos."

5. **For each image, in order:**
   - The Google Drive MCP does NOT pipe image bytes directly to vision. `read_file_content` returns `{}` for images. `download_file_content` returns a base64-encoded image inside a JSON envelope that almost always overflows the tool-result size cap, in which case the harness saves the JSON to a temp file in `~/.claude/projects/.../tool-results/mcp-...-download_file_content-<ts>.txt`. You have to decode locally before vision can see the image.
   - **Working pipeline (validated 2026-05-18 v0.1.1):**
     1. Call `download_file_content` on the Drive file ID. Expect an overflow-to-disk message containing the temp file path.
     2. Run a small Python helper via Bash that opens the temp JSON (`json.load`), base64-decodes the `content` field, and writes the bytes to `<cwd>/temp-photos/photo_<N>.jpg` where N is the sort index. Batch-decode all downloads in one Python invocation when possible.
     3. Use the `Read` tool on the local `.jpg`. Claude Code surfaces it to vision as an attached image.
     4. Write the **Description** field from what you see.
     5. Apply caption-craft + voice profile rules to write the **Caption**.
   - **Batching pattern that worked:** parallel-call `download_file_content` in groups of 8 from a single message; the temp files are saved with unique timestamps. Then one Python script globs all the temp files, JSON-parses, sorts by the leading number in the Drive `title`, and writes sequentially-numbered local jpgs. Then `Read` each local jpg.
   - The **Filename** column gets the Drive `name` (e.g., `1-web-or-mls-1231-2.jpg`). The **Photo #** is the position in the sorted list.
   - Clean up `<cwd>/temp-photos/` after the xlsx is built.

6. **On any Drive MCP error** (rate limit, permission denied, file not found): pause the batch, surface the error to the user with the offending filename, and ask whether to retry, skip, or stop.

### Step 4: For each photo, produce the row

Each row of the eventual xlsx needs five fields. Build them as you go:

- **Photo #** - sequential, matching the order the user provided (or, if photo files were uploaded, the order of upload / filename sort, or, for Drive folders, the sorted-by-name order from Step 4a).
- **Filename** - populate from the Drive file name in Drive mode, from the uploaded file name in chat-upload mode; blank in text-only mode.
- **Area** - one of: `Exterior`, `Living & Game Room`, `Kitchen & Dining`, `Bedroom 1` ... `Bedroom N`, `Bathroom`, `Backyard`, `Other`. Match the convention used in the `str-listings` plugin's template so captions stay consistent with the listing copy. If unclear, guess and flag for the user to confirm.
- **Description** - what's literally in the photo. When the user provided text, use it verbatim. When you used vision (chat-upload or Drive mode), write the description yourself in plain language ("King bed with wood headboard, two reading lamps, French doors to balcony"). The user audits this column before approving the caption.
- **Caption** - the benefit-focused, voice-matched, 1-3 sentence caption. Apply `caption-craft.md` rules + the property's voice profile.
- **Suggested Filename** (only when the rename toggle from Step 3 is ON) - a kebab-case descriptive filename, sortable by photo number. Format: `<NN>-<area-slug>-<2-to-4-key-nouns>.<ext>`. Examples: `03-backyard-cowboy-pool.jpg`, `12-primary-bedroom-king-balcony.jpg`, `21-game-room-pool-table.jpg`. Rules:
  - Lead with the zero-padded photo number for sort stability.
  - Lowercase, hyphens between words, no spaces or special chars.
  - Use the area name as the second segment, kebab'd (`bedroom-1`, `game-room`, `kitchen-dining`).
  - Pick 2-4 of the most distinctive nouns from the description. Skip generic words ("the", "with", "and", "of").
  - Keep the whole filename under 60 chars when possible. Reuse the original file extension.
  - For Cosmic Cowboy and Dragonfly Escape, prefer the property's branded terms ("cowboy-pool", not "stock-tank-pool"; "solo-stove", "propane-fire-pit").
  - Do NOT include the property name in the filename (it's redundant - the Drive folder already names the property).

### Step 5: Present captions in chat for review

After each batch (or after the full set if all-at-once), show the captions as a numbered list in chat so the user can review and request adjustments. Common adjustment requests:

- "Make #3 more playful" / "less salesy"
- "Rewrite #5 - the bed is queen not king"
- "Shorten the backyard ones, they're all too long"
- "Lean harder into the cowboy pool angle on the pool photos"

Apply edits and re-show. Don't move to the spreadsheet build until the user signs off.

### Step 6: Run the consistency check

Before building the xlsx, scan the full caption set for:

- **No repeated openings** - if three captions all start with "Step into...", rewrite two of them.
- **No tropes from the avoid list** - the profile and `caption-craft.md` both have these. Search literally.
- **Hero amenities surface** - every property has 2-4 hero amenities in its profile. They should appear in captions for the relevant photos (e.g., a backyard wide shot for Cosmic Cowboy should namedrop the cowboy pool).
- **Area tags are consistent** - bedroom numbering matches across multiple bedroom photos; "Backyard" vs "Patio" used consistently.
- **Length sanity** - count caption character/sentence length. The profile may specify max length; default is 3 sentences / ~280 characters.

If anything fails, fix it before moving on.

### Step 7: Build the team-handoff xlsx

Generate a JSON config matching the schema in `${CLAUDE_PLUGIN_ROOT}/scripts/build_sheet.js`, then run:

```bash
node ${CLAUDE_PLUGIN_ROOT}/scripts/build_sheet.js /path/to/config.json /path/to/output.xlsx
```

Save the `.xlsx` to the current working directory as `<Property Name> - Photo Captions.xlsx`. The build script validates the output (size > 1 KB, valid ZIP signature) and exits non-zero if anything is off, so a successful run means the file is good.

## Output expectations

Every run produces:

1. Captions presented in chat for review (Step 5).
2. `<Property Name> - Photo Captions.xlsx` saved to the current working directory with columns Photo # / Filename / Area / Description / Caption. If the rename toggle (Step 3) is ON, a sixth column `Suggested Filename` is appended.

The Filename column is populated when the input mode is Google Drive (Drive file names) or photo files in chat (uploaded file names). It is blank when the input was text descriptions only.

The Suggested Filename column is only present when the user opted in via the Step 3 rename toggle. The team uses these suggestions to bulk-rename in Drive after reviewing the xlsx; the plugin itself does not rename Drive files (the Google Drive MCP exposes no rename / update operation as of v0.1.2).

## When the profile is missing or thin

If `add-voice-profile` was just used to capture a new property's profile, the first batch is also a calibration pass. Be more conservative with creative liberties on the first run, present captions for review more often, and update the profile (`profiles/<slug>.md`) at the end of the session with any tone clarifications the user made along the way.

## See also

- `references/caption-craft.md` - universal caption rules (length, structure, sensory language, avoid list)
- `references/voice-profile-template.md` - the schema for property voice profiles
- `references/profiles/cosmic-cowboy.md` - worked example profile, used as the seeded reference
- `references/profiles/dragonfly-escape.md` - lake-adjacent multi-gen group house profile; the lake-accuracy banned list is a load-bearing example of property-specific guardrails
- `references/example.md` - worked end-to-end example showing the full skill flow
- `${CLAUDE_PLUGIN_ROOT}/scripts/build_sheet.js` - parameterized xlsx builder (JSON config schema documented in the file header)
