--- name: generate-podcast description: Render a spoken-word MP3 podcast from wiki pages — single-host by default or two-voice dialogue. Piper TTS default (local, free); falls back to ElevenLabs / OpenAI TTS when their API keys are present. Used by /generate podcast. Not user-invocable directly — go through /generate. user-invocable: false allowed-tools: Bash(which *) Bash(brew *) Bash(git *) Bash(mkdir *) Bash(date *) Bash(cat *) Bash(sed *) Bash(grep *) Bash(awk *) Bash(ffmpeg *) Bash(piper *) Bash(curl *) Bash(npm *) Bash(pnpm *) Bash(npx *) Read Write Glob Grep content-pipeline: - pipeline:audio - platform:agnostic - role:adapter --- # Generate Podcast Produce a 3–10 minute MP3 explainer from wiki pages. The LLM writes a spoken-word narrative, TTS renders each line, ffmpeg concatenates into a single MP3. Artifact-first — output lands in `vaults//artifacts/podcast/`. ## Usage (via /generate router) ``` /generate podcast [--vault ] [--length short|medium|long] [--two-voice] [--voice ] ``` - `--length` — `short` (~3 min), `medium` (~6 min, default), `long` (~10 min). - `--two-voice` — dialogue between two hosts instead of a monologue. - `--voice` — override the default Piper voice. Ignored when the ElevenLabs / OpenAI fallback kicks in. Same topic resolution as sibling handlers — reuses `.claude/skills/generate/lib/select-pages.sh`. ## Pipeline ``` wiki pages → LLM script writer → script.md → TTS per line → ffmpeg concat → podcast.mp3 ``` Keep the `.script.md` alongside the MP3 — it's diffable, re-renderable, and the honest primary artifact. ## Step 1: Dependency Check ```bash HAS_FFMPEG=0; HAS_PIPER=0 which ffmpeg >/dev/null 2>&1 && HAS_FFMPEG=1 which piper >/dev/null 2>&1 && HAS_PIPER=1 if [ "$HAS_FFMPEG" = "0" ]; then echo "ffmpeg missing. Installing via Homebrew…" brew install ffmpeg fi # Piper is optional if ELEVENLABS_API_KEY or OPENAI_API_KEY is set. if [ "$HAS_PIPER" = "0" ] && [ -z "$ELEVENLABS_API_KEY" ] && [ -z "$OPENAI_API_KEY" ]; then echo "Piper not found and no cloud TTS key present." echo "Installing Piper (local, free, robotic-but-serviceable)…" brew install piper-tts 2>/dev/null || { echo "Homebrew install failed. See https://github.com/rhasspy/piper for manual install." exit 1 } fi ``` ## Step 2: Resolve Vault + Topic ```bash mapfile -t PAGES < <(.claude/skills/generate/lib/select-pages.sh "$VAULT_DIR" "$TOPIC") ``` Exit 1 from the helper = no pages matched; surface verbatim. ## Step 3: Compute Source Hash ```bash HASH=$(.claude/skills/generate/lib/source-hash.sh "${PAGES[@]}") ``` ## Step 4: Write the Script The invoking LLM reads the selected pages and writes a narrative **script.md**. Two shapes supported: ### Single-host monologue (default) ```md # Podcast: {{topic}} _Length target: {{length}} (~{{minutes}} min)._ [HOST]: Welcome. Today we're talking about {{topic}}. Here's why that matters… [HOST]: First, the basics. According to {{cite: wiki/concepts/attention.md}}, attention is… [HOST]: … ``` ### Two-voice dialogue (`--two-voice`) ```md # Podcast: {{topic}} [A]: Alright, let's get into {{topic}}. [B]: Why this, why now? [A]: Because {{cite: wiki/concepts/rag.md}}… [B]: Huh. I thought… [A]: Right, but here's the nuance… ``` **Script-writing rules the LLM follows:** - Spoken-word, not read-aloud-bullets. Full sentences with natural cadence. - Cite wiki pages inline with `{{cite: path}}` — preprocessed to `*pagename*` before TTS sees them. - Length target: ~150 words per minute. 3 min → ~450 words; 6 min → ~900; 10 min → ~1500. - No "as we mentioned earlier" crutch unless the script actually mentioned it. - End with a short sources spoken-list — TTS handles it fine. Templates live at `.claude/skills/generate-podcast/templates/{single-host,two-voice}.md` and give the LLM a starting shape. ## Step 5: TTS Backend Selection Priority order: | Priority | Backend | Trigger | Cost | Quality | |---------:|---------|---------|------|---------| | 1 | ElevenLabs | `ELEVENLABS_API_KEY` set | ~$0.30 per 1k chars | Studio-grade | | 2 | OpenAI TTS | `OPENAI_API_KEY` set | ~$0.015 per 1k chars | Very good | | 3 | Piper (local) | always available once installed | free | Robotic but clean | ```bash if [ -n "$ELEVENLABS_API_KEY" ]; then TTS_BACKEND="elevenlabs" elif [ -n "$OPENAI_API_KEY" ]; then TTS_BACKEND="openai" else TTS_BACKEND="piper" fi ``` ### Voice selection - **Piper**: uses `en_US-lessac-medium` for [HOST] / [A]; `en_GB-alan-medium` for [B]. Override with `--voice `. - **OpenAI**: `alloy` for HOST/A, `onyx` for B. - **ElevenLabs**: premade voices only — free-tier API blocks library voices (Rachel/Adam) with HTTP 402. Default: `Alice` (voice id `Xb7hH8MSUJpSbSDYk0k2`) for HOST/A, `Eric` (voice id `cjVigY5qzO86Huf0OWal`) for B. Other safe premade options: `Sarah`, `Brian`, `Bill`. Override with `ELEVENLABS_VOICE_A` / `ELEVENLABS_VOICE_B` env vars (pass voice IDs, not names). ## Step 6: Render Each Line Walk the script, split by `[HOST]` / `[A]` / `[B]` tags. For each line: ```bash # Piper example echo "$LINE_TEXT" | piper \ --model "$VOICE_MODEL" \ --output_file "/tmp/podcast_${i}.wav" ``` Replace `{{cite: path}}` with the page's title (or filename stem) before TTS — the listener hears "as *attention* explains", not the raw path. Short 250ms silence between lines. Longer 600ms silence when speaker changes in two-voice mode. ## Step 7: Concatenate with ffmpeg ```bash # build a concat list for w in /tmp/podcast_*.wav; do echo "file '$w'" >> /tmp/podcast_list.txt; done # render MP3 ffmpeg -f concat -safe 0 -i /tmp/podcast_list.txt \ -codec:a libmp3lame -qscale:a 2 \ "$VAULT_DIR/artifacts/podcast/-.mp3" ``` VBR q2 is the right quality for voice — bigger files aren't audibly better, smaller noticeably worse. ## Step 8: Version Detection Before writing the sidecar, check for an existing artifact of the same type and topic: ```bash ARTIFACT_TYPE="podcast" EXISTING=$(ls "$VAULT_DIR/artifacts/$ARTIFACT_TYPE/"*"$TOPIC_SLUG"*.meta.yaml 2>/dev/null | sort | tail -1) if [ -n "$EXISTING" ]; then PREV_VERSION=$(grep '^version:' "$EXISTING" | awk '{print $2}') PREV_VERSION=${PREV_VERSION:-1} VERSION=$((PREV_VERSION + 1)) PREV_SLUG=$(basename "$EXISTING" .meta.yaml) else VERSION=1 PREV_SLUG="" fi ``` The old artifact stays in place — not deleted, not overwritten. Multiple files of the same type + topic = version history. The portal discovers and displays these automatically. Small fixes (CSS tweaks, typo corrections) should update the file in-place without incrementing the version — use judgement based on whether the content meaningfully changed. ## Step 9: Write the Sidecar ```bash META="${MP3_OUT%.mp3}.meta.yaml" cat > "$META" <" generated-from: $(for p in "${PAGES[@]}"; do echo " - $p"; done) source-hash: $HASH version: $VERSION change-note: "" replaces: "$PREV_SLUG" EOF ``` ## Step 10: Commit to Vault Repo ```bash cd "$VAULT_DIR" git add "artifacts/podcast/-."{mp3,script.md,meta.yaml} 2>/dev/null git diff --cached --quiet || git commit -m "🎙 podcast: generate ($(date +%Y-%m-%d))" ``` ## Step 11: Report to User ``` ✅ Podcast generated Topic: Format: TTS: Length: (~ min) Pages in: Source hash: Script: vaults//artifacts/podcast/-.script.md MP3: vaults//artifacts/podcast/-.mp3 Sidecar: vaults//artifacts/podcast/-.meta.yaml Listen: open ``` ## Known Limitations (Phase 2C) - **Piper voices sound robotic.** Great for draft listens; less great for sharing. Users with API keys get automatic upgrade to OpenAI / ElevenLabs. - **No music / intro stingers.** Pure voice. Phase 2C scope. - **No chaptering.** ID3 chapters would be nice. Deferred. - **Cost warning** — for long podcasts with ElevenLabs, print the projected cost **before** rendering and ask for confirmation. ## See Also - `.claude/skills/generate/SKILL.md` — router that dispatches here. - `.claude/skills/generate/lib/select-pages.sh` — shared topic resolution. - `.claude/skills/generate-video/SKILL.md` — chains this handler for voiceover. - `sites/docs/src/content/docs/reference/artifacts.md` — sidecar schema.