---
name: strict-storyboard-schema
description: Enforce a strict per-scene storyboard schema for production
use_when: Converting scripts into scene plans for prompting and generation
avoid_when: Early brainstorming where a rigid scene schema is premature
---

# Strict Storyboard Schema

Turn script sections into schema-valid scene cards for generation. The storyboard is the bridge between creative writing and production — every field maps to a downstream tool call.

## When to use

Load this skill after the script and consistency bible are approved. Convert script sections into scene cards before writing image/video prompts.

Skip during early brainstorming where a rigid scene schema is premature.

## Core philosophy

1. **Every scene is a production unit.** The `objective` explains why it exists; `action_beats` define what happens; `audio_mode` determines which tools run.
2. **Audio mode is locked at storyboard approval.** Changing it later invalidates prompts and generation decisions. Get it right here.
3. **Timing must be contiguous.** No gaps, no overlaps. Scene N `end_sec` = Scene N+1 `start_sec`.
4. **Narration text is sanitized.** Single language, no emoji, no non-speech unicode. The LLM is responsible for cleaning — TTS tool will reject violations.

## Required per-scene fields

Every scene object must contain:

| Field | Type | Description |
|-------|------|-------------|
| `objective` | string | One sentence — why this scene exists in the narrative arc |
| `shot_type` | string | Camera framing: close-up, medium, wide, over-shoulder, etc. |
| `visual_intent` | string | Primary visual result to communicate at a glance |
| `action_beats` | string[] | Ordered list of visible actions in this scene (non-empty) |
| `narration_dialogue` | string | Spoken line from the script — raw, for reference |
| `timing` | object | `{start_sec: int, end_sec: int}` — contiguous, ordered |
| `transition` | string | Explicit hand-off into next scene: hard cut, match cut, fade, dissolve |
| `audio_mode` | string | One of: `tts`, `ltx_audio` |
| `bgm_mode` | string | One of: `ltx_audio`, `generated_bgm`, `generated_song` |
| `subtitle_text` | string \| null | Subtitle line(s) for this scene; required when `audio_mode="tts"`, `null` when no subtitle intended |
| `narration_text` | string \| null | Sanitized voiceover text for TTS scenes; `null` for other modes |
| `narration_word_count` | int | Word count of narration_text (planning aid) |
| `narration_estimated_sec` | float | Estimated spoken duration: word_count / 2.5 (planning aid) |

## Conditional fields

| Condition | Required Fields |
|-----------|----------------|
| `audio_mode="tts"` | `narration_text` (non-empty, sanitized), `subtitle_text` (non-empty) |
| `bgm_mode="generated_bgm"` | `music_tags` (non-empty) |
| `bgm_mode="generated_song"` | `music_tags` (non-empty), `music_lyrics` (non-empty, structure-tagged) |

### BGM-specific fields

- **`music_tags`** — Genre, mood, instrumentation, vocal description for AceStep. Example: `"lo-fi hip hop, chill, piano and vinyl crackle, 80 BPM"`
- **`music_lyrics`** — Structure-tagged lyric text. Use `[Verse]`, `[Chorus]`, `[Bridge]`, `[Instrumental]` tags. Required for `generated_song`; `null` for `generated_bgm`.
- **`bpm`** — Integer tempo override (30-300). Do not embed in `music_tags` — use this field.
- **`keyscale`** — Key override string, e.g. `"C Major"`, `"Am"`.
- **`timesignature`** — Time signature override, e.g. `"4"`, `"3/4"`, `"6/8"`.

## Narration timing rule

Estimate the spoken duration of narration_text before finalising scene timing:

  spoken_seconds ≈ word_count(narration_text) / 2.5
  (at ~150 WPM — standard narration pace)

The scene duration (end_sec - start_sec) must be ≥ spoken_seconds + 0.5 s
(0.5 s breathing room at end).

If the script section is too long for the target scene duration:
  1. Shorten narration_text to fit within the scene duration.
  2. OR extend the scene duration (and cascade-adjust subsequent scenes).
  3. Never stretch a long narration into a short scene — it sounds rushed.

Add `narration_word_count` and `narration_estimated_sec` fields to each scene:
  "narration_word_count": 28,
  "narration_estimated_sec": 11.2
These are planning aids — not rendered in the final video.

## Validation rules

1. All required fields must be present in every scene.
2. `action_beats` must be a non-empty list.
3. `timing.start_sec` and `timing.end_sec` must be integers.
4. `end_sec - start_sec ≥ narration_estimated_sec + 0.5` for TTS scenes.
5. Scene timing must be non-overlapping and ordered — Scene N `start_sec` >= Scene N-1 `end_sec`.
6. `audio_mode` must be one of: `tts`, `ltx_audio`. Use `ltx_audio` for non-narrated scenes.
7. `bgm_mode` must be one of: `ltx_audio`, `generated_bgm`, `generated_song`. Use `ltx_audio` when no generated BGM is needed.
8. `narration_text` must be non-null, non-empty when `audio_mode` is `tts`; must be `null` otherwise.
9. `narration_text` must be single-language, no emoji, no non-speech unicode.
10. `subtitle_text` must be non-null, non-empty when `audio_mode` is `tts`; `null` only for scenes where no subtitle is intended.
11. Each line of `subtitle_text` must be ≤ `basic_info.subtitle_style.max_line_width_chars` characters.
12. `music_tags` must be non-empty when `bgm_mode` is `generated_bgm` or `generated_song`.
13. `music_lyrics` must be non-null when `bgm_mode` is `generated_song`; must be `null` for `generated_bgm`.

## Top-level output schema

```json
{
  "scenes": [
    {
      "objective": "Advance 'Hook' while grabbing attention in first 3 seconds",
      "shot_type": "close-up",
      "visual_intent": "Make the contradiction visually legible in one glance",
      "action_beats": [
        "Screen fills with red error text",
        "Cursor appears at bottom of stack trace"
      ],
      "narration_dialogue": "When your code breaks, stop reading from the top. The answer is at the bottom.",
      "timing": {
        "start_sec": 0,
        "end_sec": 5
      },
      "transition": "match cut",
      "audio_mode": "tts",
      "bgm_mode": "ltx_audio",
      "subtitle_text": "Stop reading from the top.\nThe answer is at the bottom.",
      "narration_text": "When your code breaks, stop reading from the top. The answer is at the bottom.",
      "narration_word_count": 16,
      "narration_estimated_sec": 6.4
    }
  ]
}
```

## Before writing the storyboard

Read `<project_dir>/.muse/basic_info.json` and extract:
- `scene_duration_sec` → use as the default duration for every scene's timing block
- `target_duration_sec` → use to calculate total scene count (= ceil(target / scene_duration_sec))
- `subtitle_style.max_line_width_chars` → hard cap per subtitle line; every `subtitle_text` line must stay within this limit
- `aspect_ratio`, `resolution` → include in the storyboard header; reference them in shot composition notes (e.g. "vertical 9:16 frame")

Do not invent durations — use .muse/basic_info.json values.

## Workflow

1. Read the approved script (`<project_dir>/script.md`).
2. Read the production setup (`<project_dir>/.muse/basic_info.json`) for scene duration, total duration, subtitle style, and framing notes.
3. Read the consistency bible (`<project_dir>/.muse/consistency.json`) for character and environment anchors.
4. Break the script into scenes — one per section for short-form, 1-2 per chapter for long-form.
5. For each scene:
   - Write `objective` from the script section's narrative purpose.
   - Set `shot_type` and `visual_intent` using consistency anchors.
   - List `action_beats` as visible, concrete actions.
   - Copy `narration_dialogue` from script narration lines.
   - Set `timing` to cover the section duration (contiguous with neighbors).
   - Choose `transition` based on narrative hand-off.
   - Set `audio_mode` (`tts` or `ltx_audio`) and `narration_text` (sanitized, single-language).
   - Set `bgm_mode` (`ltx_audio`, `generated_bgm`, or `generated_song`).
   - Set `subtitle_text`: required for `tts` scenes, within `max_line_width_chars` per line.
   - Compute `narration_word_count` and `narration_estimated_sec` (word_count / 2.5).
   - If `generated_bgm` or `generated_song`, add `music_tags` and optionally `music_lyrics`.
6. Validate all scenes against the 13 rules above.
7. Save to `<project_dir>/.muse/storyboard.json`.
8. Write a companion `storyboard.md` (see below).
9. Call `request_approval(stage='storyboard', summary=<one-line description>, artifact_paths=['<project_dir>/.muse/storyboard.json', '<project_dir>/storyboard.md'])`.
10. Ask: "Does the storyboard cover all the beats you had in mind? Any scenes to add, remove, or reorder?"

## Worked example

*Project:* 30-second TikTok about debugging (continuing from script example)

```json
{
  "scenes": [
    {
      "objective": "Hook: Grab attention by contradicting the instinct to read errors top-down",
      "shot_type": "close-up",
      "visual_intent": "Wall of red error text — the universal 'something broke' moment",
      "action_beats": [
        "Screen fills with red error text",
        "Cursor hovers at top, then flips to bottom"
      ],
      "narration_dialogue": "When your code breaks, stop reading from the top. The answer is at the bottom.",
      "timing": {
        "start_sec": 0,
        "end_sec": 5
      },
      "transition": "match cut",
      "audio_mode": "tts",
      "bgm_mode": "ltx_audio",
      "subtitle_text": "Stop reading from the top.\nThe answer is at the bottom.",
      "narration_text": "When your code breaks, stop reading from the top. The answer is at the bottom.",
      "narration_word_count": 16,
      "narration_estimated_sec": 6.4
    },
    {
      "objective": "Conflict: Show why top-down reading wastes time and misses the root cause",
      "shot_type": "medium",
      "visual_intent": "Split screen — wasted time scrolling vs direct bottom-up approach",
      "action_beats": [
        "Split screen appears — left side scrolls from top with timer",
        "Right side jumps to bottom line immediately",
        "Left side timer keeps running, right side highlights root cause"
      ],
      "narration_dialogue": "Stack traces show the symptom first and the cause last. Most beginners fix the wrong line and wonder why the error comes back.",
      "timing": {
        "start_sec": 5,
        "end_sec": 17
      },
      "transition": "match cut",
      "audio_mode": "tts",
      "bgm_mode": "ltx_audio",
      "subtitle_text": "Stack traces show symptom first,\ncause last.",
      "narration_text": "Stack traces show the symptom first and the cause last. Most beginners fix the wrong line and wonder why the error comes back.",
      "narration_word_count": 23,
      "narration_estimated_sec": 9.2
    },
    {
      "objective": "Payoff: Demonstrate the bottom-up technique with a concrete result",
      "shot_type": "close-up",
      "visual_intent": "Cursor at bottom line — root cause highlighted, problem solved",
      "action_beats": [
        "Cursor highlights the bottom-most function call",
        "Error text fades, green checkmark appears"
      ],
      "narration_dialogue": "Read from the bottom up. The last line is the root cause. Everything above it is just the domino effect.",
      "timing": {
        "start_sec": 17,
        "end_sec": 25
      },
      "transition": "hard cut",
      "audio_mode": "tts",
      "bgm_mode": "ltx_audio",
      "subtitle_text": "The last line is the root cause.\nEverything above it is the domino effect.",
      "narration_text": "Read from the bottom up. The last line is the root cause. Everything above it is just the domino effect.",
      "narration_word_count": 19,
      "narration_estimated_sec": 7.6
    },
    {
      "objective": "CTA: Direct the viewer to try the technique next time",
      "shot_type": "medium",
      "visual_intent": "Clean closing frame with takeaway message",
      "action_beats": [
        "Text overlay appears: 'Bottom-up debugging'",
        "Fade to end card"
      ],
      "narration_dialogue": "Next time an error pops up, try the bottom-up approach. You'll cut your debug time in half.",
      "timing": {
        "start_sec": 25,
        "end_sec": 30
      },
      "transition": "hard cut",
      "audio_mode": "tts",
      "bgm_mode": "ltx_audio",
      "subtitle_text": "Try bottom-up debugging.\nCut your debug time in half.",
      "narration_text": "Next time an error pops up, try the bottom-up approach. You'll cut your debug time in half.",
      "narration_word_count": 17,
      "narration_estimated_sec": 6.8
    }
  ]
}
```

## Rhythm and continuity guidance

- Vary shot type and pacing to avoid monotony.
- Keep narration concise enough for subtitle readability (~20 words max per line).
- Ensure each scene objective connects directly to script progression.
- Use transitions intentionally — match cut for flow, hard cut for emphasis, fade for chapter breaks.

## Anti-patterns

- Scene cards missing timing or transition details.
- Action beats that are abstract instead of visually testable.
- Narration that does not match visible action in the same scene.
- Setting `narration_text` to a multi-language string or leaving emoji/unicode in it.
- Setting `narration_text` to non-null when `audio_mode` is not `tts`.
- Setting `subtitle_text` to non-null when `audio_mode` is not `tts` and no subtitle is intended.
- Exceeding `basic_info.subtitle_style.max_line_width_chars` per line in `subtitle_text`.
- Using `audio_mode="music"` or `audio_mode="song"` — these are removed; use `bgm_mode="generated_bgm"` or `"generated_song"` instead.
- Missing `music_tags` when `bgm_mode` is `generated_bgm` or `generated_song`.
- Missing `music_lyrics` when `bgm_mode` is `generated_song`.
- Putting BPM/key/time signature in `music_tags` instead of using dedicated metadata fields.
- Deciding audio mode or BGM mode after video prompts are already written — both must be locked at storyboard approval.

## Companion markdown format: storyboard.md

Write the markdown in this format:

```markdown
# Storyboard: [project title]

## Scene 01 — [Scene name]
**Duration:** Xs–Ys (Z sec)
**Audio mode:** tts / ltx_audio
**BGM mode:** ltx_audio / generated_bgm / generated_song
**Subtitle:** "[subtitle_text]"
**Action:** [visual description]
**Narration:** "[script line]"
...
```

## Next step

After storyboard approval, the pipeline moves to image prompts (load `image-prompt-package` skill), then video prompts (load `video-prompt-package` skill), then media generation (load `media-generation` skill).