--- name: image-explore description: "Brainstorm multiple visual directions for a blog image, generate them in parallel, build a comparison page, and optionally publish as a shareable link (Surge.sh or gist)." argument-hint: " [--count N] [--variants N] [--style 'override'] [--aspect 'W:H']" allowed-tools: Bash, Read, Write, Glob, Grep, AskUserQuestion, WebFetch, Agent --- # Image Explore - Visual Direction Brainstorming Generate multiple distinct visual directions for a blog image, render them all in parallel, build a comparison page, and optionally publish as a shareable link for feedback. ## Arguments Parse the user's input for: - **Target**: A file path (e.g., `_d/ai-native-manager.md`) or a freeform topic (e.g., "chaos of AI adoption") - **`--count N`**: Number of directions to brainstorm and generate (default: 5, max: 8) - **`--variants N`**: Number of minor variants per direction (default: 1, max: 3). Each variant tweaks the scene (different angle, lighting, composition) while keeping the same concept and shirt text. - **`--style 'description'`**: Override the default illustration style (passed through to gen-image) - **`--aspect 'W:H'`**: Aspect ratio (default: 3:4). Valid: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9` - **`--ref 'path'`**: Override reference image (default: raccoon canonical ref) ## Workflow ### Phase 1: Analyze Content If the target is a file path: 1. Read the file 2. Identify the **hook** — what's the one idea a reader should remember? 3. Note key metaphors, section themes, emotional arc 4. Check for existing images (look for `imagefeature`, `local_image`, `blob_image` includes) If the target is a freeform topic: 1. Use it directly as the creative brief 2. Skip to Phase 2 ### Phase 2: Brainstorm Directions This is the creative core. Generate `--count` **distinct visual directions**. Each direction must have: - **Name**: 2-4 evocative words (e.g., "Circus Ringmaster", "Surfing the Wave") - **Section**: Which part of the post it maps to (or "standalone") - **Scene**: One-sentence description of the image - **Vibe**: What feeling it evokes (e.g., "controlled chaos", "quiet confidence") - **Shirt text**: For raccoon style, what the shirt reads (max 8 chars) **Directions must be meaningfully different.** Vary across these axes: - Literal vs. metaphorical - Action vs. stillness - Humor vs. gravitas - Individual vs. group scene - Indoor vs. outdoor / grounded vs. fantastical Avoid generating 5 variations of the same idea. If the post has one dominant metaphor, use it for at most 2 directions and find fresh angles for the rest. Present directions as a table: | # | Name | Section | Scene | Vibe | Shirt | | --- | ---------------- | ------------- | ---------------------------------------- | -------------- | ------- | | A | Mission Control | Year of Chaos | Raccoon at NASA console, screens on fire | "This is fine" | SHIP IT | | B | Surfing the Wave | AI Adoption | Raccoon surfing tidal wave of AI debris | Riding chaos | SHIP IT | Confirm with user via `AskUserQuestion` before generating. User may add, remove, or modify directions. **If `--variants` > 1:** After user approves the directions, craft variant scenes for each. Each variant keeps the same concept, shirt text, and vibe, but varies the specific scene description (different angle, setting detail, composition, or lighting). Do NOT present variant scenes for approval — just generate them. ### Phase 3: Generate Images in Parallel 1. Resolve the script path once: ```bash CHOP_ROOT="$(cd "$(dirname "$(readlink -f ~/.claude/skills/image-explore/SKILL.md)")" && git rev-parse --show-toplevel)" GEN="$CHOP_ROOT/skills/image-explore/generate.py" ``` 2. Write a `directions.json` file with all directions (used by both Phase 3 and Phase 4). **Without variants** (1 entry per direction): ```json [ { "name": "Mission Control", "section": "Year of Chaos", "vibe": "This is fine", "shirt": "SHIP IT", "scene": "Raccoon at NASA console, screens showing fire", "output": "mission-control.webp" } ] ``` **With variants** (multiple entries per direction, grouped by `group` field): ```json [ { "name": "Mission Control v1", "group": "Mission Control", "section": "Year of Chaos", "vibe": "This is fine", "shirt": "SHIP IT", "scene": "Raccoon at NASA console, screens showing fire, dramatic front view", "output": "mission-control-v1.webp" }, { "name": "Mission Control v2", "group": "Mission Control", "section": "Year of Chaos", "vibe": "This is fine", "shirt": "SHIP IT", "scene": "Raccoon at NASA console seen from side, leaning back in chair sipping tea", "output": "mission-control-v2.webp" } ] ``` The `group` field enables `build-page.py` to group variants under a shared heading. Output filenames follow the pattern `{slug}-v{N}.webp` when using variants. **Scene-first prompt ordering** (`"scene_first": true`): By default, prompts are assembled as: character style → "large & prominent 40%" → scene. This works well for character-focused images but fights wide-field compositions where characters should be small elements in a larger scene. Set `"scene_first": true` on any direction where the **scene composition matters more than character prominence**. This reorders the prompt to: scene → character style → shirt text, and drops the "40% of image" instruction. Use it for: - Bird's-eye/aerial views of fields, landscapes, maps - Group scenes where many characters are small - Any composition where the environment dominates ```json { "name": "Overhead Field", "scene": "Aerial drone shot of a soccer field with raccoons scattered across it...", "shirt": "NEXT", "output": "overhead-field.webp", "scene_first": true } ``` 3. **Generate all images in parallel** with a single command: ```bash uv run "$GEN" batch directions.json ``` Pass `--aspect`, `--ref`, or `--style` if overriding defaults. The script handles env loading, prompt assembly, ref image resolution, and parallel execution via thread pool (secrets never leak into command strings). 4. After batch completes, the directions JSON is automatically augmented with `_prompt` and `_duration_s` fields for each entry (used by the comparison page for debug info). ### Phase 3b: Verify & Retry After generation, **verify each image actually matches its scene description** before showing to the user. This catches cases where Gemini ignores complex scene descriptions (e.g., split-screens, multiple characters, specific compositions). 1. **Launch background sub-agents in parallel** (one per image) to verify each result. Each agent should: - `Read` the generated image file (Claude has vision) - Compare what it sees against the **scene** description - Check for: - **Scene composition**: Does the layout match? (e.g., split-screen actually split, multiple characters present) - **Key elements**: Are the described elements visible? (e.g., dust cloud, trajectory arc, binoculars) - **Shirt text**: Is it readable and roughly correct? - Return a verdict: **pass** (scene clearly rendered) or **fail** with a short explanation of what's wrong 2. Collect results from all agents. This runs concurrently and doesn't block other work. 3. **Write verification results back to the directions JSON** — for each entry, add: - `_verification`: `"pass"` or `"fail"` - `_verification_reason`: short explanation (e.g., "Solo raccoon portrait, no soccer field or group scene") These fields are rendered in the comparison page's collapsible debug details. 4. For any **failures**, retry up to 2 times: - Strengthen the scene description (be more explicit, add emphasis) - Re-run `generate.py` for just the failed entries (write a temporary batch JSON) - Launch verification agents again for the retried images - Update `_verification` and `_verification_reason` with the retry result 5. After retries, report any still-failing images to the user rather than silently including bad results **What counts as a failure:** - Single character when scene calls for a group (or vice versa) - Missing the core concept entirely (e.g., "split-screen" rendered as single scene) - Wrong setting (indoor when outdoor was specified) **What does NOT count as a failure:** - Shirt text slightly wrong (Gemini often struggles with exact text) - Style differences from the reference - Minor composition differences (angle, lighting) 6. Show all verified images to the user with the `Read` tool. ### Phase 4: Build Comparison Page Build and serve the comparison page (reuses the same `directions.json` — `build-page.py` reads `name`/`section`/`vibe`/`shirt` and accepts either `image` or `output` for the file path): ```bash uv run "$CHOP_ROOT/skills/image-explore/build-page.py" \ --title "Image Explore: Topic Name" \ --dir docs/image-explore-topic/ \ --images-dir images/ \ directions.json ``` Options: - `--images-dir PATH`: Where to find generated images (default: current directory). Useful when images were written to a different directory than where you run the command (e.g., `images/`). - `--no-serve`: Skip starting the HTTP server. This creates the showboat doc, converts images, generates HTML via pandoc, and starts a local HTTP server. It prints the Tailscale URL. When `directions.json` contains entries with a `group` field, the page groups variants under shared direction headings with sub-headers for each variant. ### Phase 5: Publish (Ask First) Ask the user: "Want to publish this as a shareable link?" and offer two options: 1. **Surge.sh** (Recommended) — Full HTML/CSS/JS support, lightbox clicking works 2. **GitHub Gist** — Simpler but gisthost may block inline JS (lightbox won't work) #### Option A: Surge.sh ```bash # Prepare deploy directory (random path to avoid accidental overwrites) SURGE_DIR=$(mktemp -d /tmp/surge-XXXXXXXX) cp /demo.html "$SURGE_DIR/index.html" cp /*.png "$SURGE_DIR/" # Deploy (pick a descriptive subdomain) surge "$SURGE_DIR" .surge.sh ``` #### Option B: GitHub Gist Uses the **gist-image** skill technique (create gist, clone, push binary files via git) plus gisthost-specific URL rewriting. The helper script automates this: ```bash uv run "$CHOP_ROOT/skills/image-explore/publish-gist.py" demo.html --title "Description" ``` This handles: gist creation, image conversion to JPEG, URL rewriting, git push. It prints the gisthost URL. Note: gisthost may block inline `