--- name: graphic-overlays description: Package an existing talking-head / interview / podcast video by layering timed, designed GRAPHIC OVERLAY cards onto the playing video — titles, lower-thirds, data callouts, quotes, side panels, picture-in-picture — synced to the transcript. The source video plays in full; the agent designs and writes each card's HTML in conversation, then renders to MP4 via hyperframes. Use when the user asks for graphic overlays, on-screen graphics / lower-thirds / data callouts / kinetic titles on a video, "package / dress up my video", "add overlay cards / graphic cards", or AI-composed graphic packaging of an existing video. NOT for plain subtitles (→ embedded-captions) or building a video from scratch (→ the creation workflows); when unsure overlays-vs-captions, see /hyperframes-read-first. --- # Graphic Overlays Graphic Overlays takes a local video that **plays in full** and layers a sequence of timed, designed **graphic cards** onto it — titles, lower-thirds, data callouts, quotes, side panels, picture-in-picture — synced to what's being said. The agent designs the cards (timing + content) and **writes each card's HTML directly in the conversation**, then assembles a single composition HTML and renders it to MP4 via `hyperframes`. There is no fixed archetype list and no prescribed card structure — the overlays emerge from what the transcript actually says. > **Confirm the route before you build.** This skill packages an **existing talking-head clip** with **designed graphic cards** (titles, lower-thirds, data callouts, quotes, side panels, PiP). If the user wants **plain captions / subtitles** (the spoken words as text) → `/embedded-captions`; a **single short unnarrated** element (one logo sting / lower-third) → `/motion-graphics`. **The clip plays untouched** — re-timing, recoloring, reframing, reordering, or audio is NLE editing and **out of scope**. Building from a URL / topic / PR → the creation workflows. Unsure overlays-vs-captions? **Read `/hyperframes-read-first` first.** > **Graphic-packaging sibling of `embedded-captions`.** Captions add the _spoken words_ > as a readable subtitle; this adds _designed graphics_ on top of the playing video. > Plain subtitles → `embedded-captions`. Build a video from scratch → the creation > workflows (`product-launch-video` / `faceless-explainer` / …). Inspectable intermediate files in the work directory: - `metadata.json` — duration / width / height / fps - `audio.mp3` — extracted audio - `transcript.json` — a flat **word array** `[{ text, start, end }, …]` (Whisper; no `segments`, no `words` wrapper) - `storyboard.json` — lightweight card outline (the agent's plan) - `public/cards/card-XX.html` — one HTML fragment per card - `public/index.html` — final assembled composition - `output.mp4` — rendered video ## CLI Resolution ```bash # hyperframes — transcription (local Whisper) + rendering the assembled HTML to MP4 npx hyperframes --help ``` This skill runs entirely on the **hyperframes** CLI plus system `ffmpeg` / `ffprobe`. Transcription is local **Whisper** via `hyperframes transcribe` — no third-party service, API key, or rate-limited proxy. ## Workflow ### 1. Check Environment ```bash npx hyperframes doctor # ffmpeg, headless browser, render deps # confirm bundled assets: ls "/assets/fonts" "/assets/vendor/gsap.min.js" ``` Required: - `ffmpeg` / `ffprobe` (system) - `/assets/fonts/*.woff2`, `/assets/vendor/gsap.min.js` (bundled inside this skill, staged to work dir in Step 9) Transcription needs no key — `hyperframes transcribe` runs Whisper locally (Step 4). Strongly recommended on macOS for `hyperframes render`: ```bash export PRODUCER_BROWSER_GPU_MODE=hardware ``` ### 2. Create a Work Directory All artifacts live under `videos//` — the same convention as the other video workflows (`product-launch-video` / `faceless-explainer` / `pr-to-video`). Keep the cwd at the workspace root; everything below writes under this one subdirectory. ```bash VIDEO_PATH="/absolute/path/input.mp4" WORK_DIR="videos/$(basename "$VIDEO_PATH" | sed 's/\.[^.]*$//')" mkdir -p "$WORK_DIR" ``` ### 3. Extract Audio and Metadata ```bash # metadata — duration / width / height / fps ffprobe -v error -select_streams v:0 \ -show_entries stream=width,height,r_frame_rate \ -show_entries format=duration -of json "$VIDEO_PATH" > "$WORK_DIR/metadata.json" # audio ffmpeg -y -i "$VIDEO_PATH" -vn -acodec libmp3lame -q:a 2 "$WORK_DIR/audio.mp3" ``` Outputs: `metadata.json` (read `width`/`height`/`duration`; fps = the `r_frame_rate` fraction evaluated, e.g. `30000/1001 → 29.97`) + `audio.mp3`. ### 4. Transcribe ```bash npx hyperframes transcribe "$WORK_DIR/audio.mp3" -d "$WORK_DIR" --json --model small.en ``` Local **Whisper** — no API key, no proxy, no rate limit. Writes a word-level `transcript.json` into the work dir (word `text` + `start` / `end` timestamps). Read it for the word / sentence timings that drive card timing in Step 6; group words into sentences yourself at punctuation / pauses if you need segment-level chunks. **Clamp to media duration.** Whisper can return the final word's `end` a hair past the actual clip length — clamp every card `endSec` and `composition.durationSeconds` to the `metadata.json` duration, or the render will show a black tail past the video. ### 5. Correct Transcript `transcript.json` is a **flat array of word objects** — `[{ "text": "...", "start": s, "end": s }, …]` (no `segments` array, no `words` wrapper; the per-word key is **`text`**). Read it and fix obvious ASR errors: - Homophones, product names, technical terms, punctuation - Edit a word's `text` in place; **preserve its `start` / `end`** timestamps - There is no pre-grouped `segments` array — **group words into sentences yourself** (split at terminal punctuation / pauses) when you need segment-level chunks for card timing ### 6. Draft a Lightweight Storyboard (in chat) **No CLI involved.** Read `transcript.json` + `metadata.json` and design cards directly. `storyboard.json` is an agent-internal planning artifact — no CLI command consumes it; it exists so you can think clearly about timing and content before writing each card's HTML. Keep the shape consistent with the example below so the same outline can drive the composition you author in Step 9: ```json { "schemaVersion": 3, "composition": { "fps": 30, "width": 1080, "height": 1920, "durationSeconds": 121.2, "layout": "portrait", "themeId": "noir", "seed": 42 }, "videoTrack": { "sourcePath": "input-video.mp4", "startSec": 0, "endSec": 121.2, "bounds": { "x": 0, "y": 0, "width": 1080, "height": 1920 } }, "subtitles": { "enabled": false }, "cards": [ { "id": "card-01", "intent": "Hook with the speaker's anxious midnight question", "startSec": 0.5, "endSec": 13.0, "accentIndex": 0, "zone": "fullscreen", "contentHints": { "kicker": "AN HONEST QUESTION", "title": "The soul-searching question at 11 PM", "detail": "Client's 60-second voice message: 'If the RMB appreciates, does that mean my USD policy is a terrible loss?'" } } ] } ``` **Required Card fields:** | field | type | purpose | | ----------------------- | ------------------------------------------ | ----------------------------------------------------------------------------------------------------- | | `id` | string | stable id used in card HTML & GSAP selectors | | `intent` | string | natural-language description; fed to card synthesis | | `startSec` / `endSec` | number | times in seconds (endSec > startSec) | | `accentIndex` | 0 \| 1 \| 2 \| 3 \| 4 | which of the 5 theme accent colors this card pulls | | `zone` | enum (see below) | where on the canvas the card lives | | `contentHints` | object | free-form bag; agent puts kicker/title/detail/data/quote here | | `archetype` (optional) | string | free-form label you may attach to remember a card's pattern; absent = free-form, which is the default | | `transition` (optional) | enum: `cut` \| `fade` \| `slide` \| `wipe` | declarative card-to-card transition | **Five `zone` values:** | zone | resolved bounds | when to use | | ----------------- | ---------------------------------------------- | --------------------------------------- | | `fullscreen` | covers whole canvas | hero moments, big numbers, mantras | | `whiteboard-area` | inset 40px margin (or 45% of portrait height) | dense data / annotated content | | `lower-third` | bottom 30% band | annotation over visible video | | `side-panel` | right 42% (landscape) or bottom 40% (portrait) | data side, video other side | | `video-overlay` | full canvas, expects mostly-transparent card | annotation overlays on full-bleed video | When you assemble the composition in Step 9, resolve each card's `zone` into pixel bounds on the card-host wrapper following the table above. Video bounds are set **once** at composition level (`videoTrack.bounds`); to make video appear to "move between cards", author GSAP tweens against `#video-wrap` in the composition's ` ``` #### GSAP Statement Cheat Sheet Compile each `data-anim` attribute into a GSAP statement. Times are **absolute seconds** = card.startSec + data-anim-at, quantized to 1/fps. Selector is `.card[data-card-id="X"] #elementId`. | data-anim | GSAP statement template | | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `fade-in` | `tl.fromTo(SEL, { opacity: 0 }, { opacity: 1, duration: D, ease: 'power2.out' }, T);` | | `fade-out` | `tl.to(SEL, { opacity: 0, duration: D, ease: 'power2.in' }, T);` | | `slide-in` (from=left, dist=80) | `tl.fromTo(SEL, { opacity: 0, x: -80 }, { opacity: 1, x: 0, duration: D, ease: 'power2.out' }, T);` | | `kinetic-chars` (pop) | `tl.from(SEL + ' .char', { opacity: 0, y: 8, scale: 0.8, duration: D, ease: 'power2.out', stagger: S }, T);` | | `count-up` | `(function(){const o={v:FROM};tl.to(o,{v:TO,duration:D,ease:'power2.out',onUpdate:function(){const el=document.querySelector(SEL);if(el)el.textContent=__fmt(o.v,'FMT');}},T);})();` | | `draw-path` | `(function(){const el=document.querySelector(SEL);if(el){const L=el.getTotalLength();tl.set(SEL,{strokeDasharray:L,strokeDashoffset:L},T);tl.to(SEL,{strokeDashoffset:0,duration:D,ease:'power2.inOut'},T);}})();` | | `grow-x` (target-w=W) | `tl.fromTo(SEL, { width: 0 }, { width: W, duration: D, ease: 'power2.out' }, T);` | | `grow-y` (target-h=H) | `tl.fromTo(SEL, { height: 0 }, { height: H, duration: D, ease: 'power2.out' }, T);` | | `scale-pop` | `tl.fromTo(SEL, { opacity: 0, scale: 0.6 }, { opacity: 1, scale: 1, duration: D, ease: 'back.out(1.6)' }, T);` | | `mask-reveal` (direction=left) | `tl.fromTo(SEL, { clipPath: 'inset(0 100% 0 0)' }, { clipPath: 'inset(0 0 0 0)', duration: D, ease: 'power2.inOut' }, T);` | Quantize: `T = Math.round(absSec * fps) / fps`. At 30fps the smallest step is `1/30 ≈ 0.0333s`; rounding to 4 decimals (`.toFixed(4)`) is fine inside the JS literal. #### Video Framing Reference (per `layout` value) The selector for the video container is `#video-wrap`. Animate its bounds between cards using `tl.to('#video-wrap', { ...bounds }, T)`. Initial bounds should be set inline on the element to match card-01's layout. Pick a transition duration of 0.5–0.7s with `ease: 'power2.inOut'`. **Decorative frames** (`clean` / `hairline` / `polaroid`) sit as a **sibling** of `#video-wrap` and follow it through layout transitions. See [`references/frames/`](references/frames/) for each frame's placement HTML, suggested CSS, and which layouts it pairs with. Quick rule: `overlay` layout suppresses decorative frames (the full-bleed video clashes with chrome); PiP layouts already have their own pill treatment (border-radius + white ring + shadow), so add a decorative frame only on top of `split` / `stack`. **GSAP target lookup table** for `#video-wrap` per composition layout (landscape 1920×1080 — for portrait & 4:5 see `references/layouts/*.html` which list all three ratios): | composition layout | typical card.zone | `#video-wrap` GSAP target | extra css class | | ------------------------------------ | ----------------- | ------------------------------------------------------------------------- | ------------------------------------------ | | `split` | `side-panel` | `{ left: 960, top: 0, width: 960, height: 1080 }` | — | | `stack` | `lower-third` | `{ left: 14, top: 14, width: 1892, height: 548 }` (top 52%) | — | | `pip` (bottom-right) | `fullscreen` | `{ left: 1480, top: 760, width: 400, height: 300 }` | `pip-pill` (border-radius + ring + shadow) | | `pip` (top-left) | `fullscreen` | `{ left: 40, top: 40, width: 400, height: 300 }` | `pip-pill` | | `overlay` (video full-bleed) | `video-overlay` | `{ left: 0, top: 0, width: 1920, height: 1080 }` (no change from default) | — | | **hide video** (pure-graphic moment) | `fullscreen` | `{ opacity: 0 }` (or move off-canvas) | — | To toggle the pip-pill chrome (border-radius + white ring + drop shadow) when entering or leaving a pip moment: ```js // Enter pip — add chrome tl.set("#video-wrap", { className: "video-wrapper pip-pill" }, T); tl.to( "#video-wrap", { left: 1480, top: 760, width: 400, height: 300, duration: 0.6, ease: "power2.inOut" }, T, ); // Leave pip — back to clean full-bleed tl.set("#video-wrap", { className: "video-wrapper" }, T_NEXT); tl.to( "#video-wrap", { left: 0, top: 0, width: 1920, height: 1080, duration: 0.6, ease: "power2.inOut" }, T_NEXT, ); ``` **Card-host bounds match the zone**. Resolve the card's `zone` into pixel bounds using the table at the top of Step 6, then write those into the card-host's inline `style="left:Xpx;top:Ypx;width:Wpx; height:Hpx;..."`. For `video-overlay` zone (overlay recipe), the card-host fills the full canvas — your CSS inside `.card .root` decides where the actual visible card sits. #### HyperFrames Layout / Animation QA Rules - Build each card's static hero frame first: the moment where the card is fully visible and readable. - Confirm video, cards, subtitles/captions, and diagrams do not unintentionally overlap. - Confirm hidden video areas are clipped by the frame and not visible outside intended bounds. - Register one paused master timeline as `window.__timelines["graphic-overlays"]`. - Build timelines synchronously at page load; no `async`, `setTimeout`, Promises, or media `play()` calls. - Do not use `Math.random()` or `Date.now()` in render paths. - Do not use `repeat: -1`; calculate finite repeats from the video duration. - Prefer GSAP transforms and opacity (`x`, `y`, `scale`, `rotation`, `opacity`) over layout properties (`top`, `left`, `width`, `height`) for motion. - Animate wrappers such as `#video-wrap`, not the video element dimensions directly. - Avoid animating the same property on the same element from multiple timelines at the same time. - Use `data-track-index`, not `data-layer`; use `data-duration`, not `data-end`. - Every timed element (`card-host`, sub-composition, etc.) MUST include `class="clip"` alongside its own classes — e.g. `class="card-host clip"`. The HyperFrames runtime uses `.clip` to gate visibility to the `data-start … data-start+data-duration` window. Without it the element is visible for the whole video (lint: `timed_element_missing_clip_class`). - For body / global `font-family`, list **concrete font names** (`'Inter', 'Caveat', …`) — not a CSS variable like `var(--font-family)`. The HyperFrames font resolver doesn't expand CSS vars during static analysis (lint: `font_family_without_font_face`). Cards may still use `var(--font-family)` internally since their `@font-face` declarations are loaded. ### 10. Render to MP4 ```bash cd "$WORK_DIR" PRODUCER_BROWSER_GPU_MODE=hardware npx hyperframes render public \ -o output.mp4 \ --fps 30 ``` `hyperframes render ` reads `/index.html` and produces the MP4. The flag `PRODUCER_BROWSER_GPU_MODE=hardware` (or `--browser-gpu`) is strongly recommended on macOS — software-only Chrome rendering times out on most laptops. For a sanity check before the full render, capture a single frame at a specific timestamp: ```bash npx hyperframes snapshot public --at 5 # → public/snapshots/frame-00-at-5s.png (a single --at ignores --out) ``` ### 11. Report Results Tell the user: - Work directory path - `storyboard.json` (the card outline you designed) - `public/cards/*.html` (one HTML per card) - `public/index.html` (the assembled composition) - `output.mp4` (the final video) - ASR provider used - Card count + how you chose them (in 1 sentence) - Any missing keys or quality caveats **Optional live preview (on request only).** The clip plays unchanged inside `public/index.html` with the overlays on top, so it previews faithfully. **Don't open it during the run.** When the user asks, start a long-lived server **after** render and report the URL: ```bash (cd "$WORK_DIR/public" && npx hyperframes preview) # or `npx hyperframes play` for a shareable link ``` Do not delete the work directory unless the user asks.