---
name: video-input
description: Analyzes and understands video content by extracting frames, transcription, and context. Creates comprehensive summaries for discussion and reference.
---

# Video Content Understanding

This skill enables deep understanding of video content through frame extraction, transcription analysis, and content matching.

## When to Use This Skill

- User provides a video file for analysis
- Need to understand video content for discussion or reference
- Extracting key information from educational or tutorial videos
- Analyzing presentation or conference recordings
- Creating summaries of video content
- Matching visual content with spoken words

## What This Skill Does

1. **Frame Extraction**: Converts video into individual frames using `ffmpeg`
2. **Audio Transcription**: Extracts spoken content using `whisper` or similar tools
3. **Content Matching**: Aligns frames with corresponding transcription segments
4. **Summary Generation**: Creates comprehensive markdown documentation of findings
5. **Context Preservation**: Maintains understanding for ongoing discussions about the video

## How to Use

### Basic Video Analysis

```
Analyze this video: [path/to/video.mp4]
```

```
What's in this video? [attached video file]
```

### Content Extraction

```
Extract the main points from this tutorial video
```

```
Give me a summary of what's covered in this presentation
```

### Detailed Analysis

```
Break down this video frame by frame with transcription
```

## Technical Process

### Working Directory Structure

All video analysis is performed in a dedicated directory to avoid conflicts between multiple Claude instances:

```
${CLAUDE_PROJECT_DIR}/.video-input/analysis_{timestamp}/
├── video_file.mp4           # Copy of the original video
├── transcription.srt        # Extracted transcription/subtitles
├── frame_transcription_map.txt  # Frame-to-transcription correlation
├── metadata.txt             # Video metadata (duration, resolution, format)
├── frames/                  # Extracted video frames
│   ├── frame_0001.png
│   ├── frame_0002.png
│   └── ...
├── analysis.md              # Comprehensive analysis summary
└── .completed               # Marker file indicating successful completion
```

**Directory Path**: `${CLAUDE_PROJECT_DIR}/.video-input/analysis_{timestamp}/`
- `CLAUDE_PROJECT_DIR`: Environment variable pointing to project root (falls back to current working directory if not set)
- `{timestamp}`: ISO format timestamp (e.g., `20251205_143022`) for unique isolation
- `.video-input`: Hidden directory to keep project root clean

**Note**: The script uses the current working directory (where you run the command from) if `CLAUDE_PROJECT_DIR` is not set. Always run the script from your project root to ensure the analysis is created in the correct location.

### Processing Steps

**IMPORTANT**: Always use the `analyze-video.sh` script for video analysis. This script ensures all steps are executed properly without skipping any crucial parts.

#### Prerequisites

**⚠️ IMPORTANT**: Before running the script, navigate to your project's root directory where you want the analysis to be created. The script will create a `.video-input/analysis_{timestamp}/` directory in your current working directory.

```bash
# Navigate to your project root first
cd /path/to/your/project
```

The analysis output will be created in `$(pwd)/.video-input/analysis_{timestamp}/` relative to where you run the script.

#### Running the Script

Run the script using its absolute path:

```bash
# Basic usage (default: 1 fps, base model)
~/.claude/skills/video-input/analyze-video.sh /path/to/video.mp4

# Custom frame rate (2 frames per second)
~/.claude/skills/video-input/analyze-video.sh /path/to/video.mp4 2

# Custom Whisper model for better accuracy
~/.claude/skills/video-input/analyze-video.sh /path/to/video.mp4 1 medium

# All options
~/.claude/skills/video-input/analyze-video.sh /path/to/video.mp4 [fps] [whisper_model]
```

#### What the Script Does

1. Validates dependencies (ffmpeg, ffprobe, whisper)
2. Creates isolated timestamped workspace
3. Copies video file to workspace
4. Extracts video metadata (duration, resolution, format)
5. Extracts frames at specified FPS
6. Transcribes audio using Whisper
7. **Matches frames with transcription timestamps**
8. Generates comprehensive analysis.md with integrated timeline
9. Creates completion marker (.completed file)

Each step includes verification to ensure no steps are skipped and all files are created successfully.

## Example Output

**User**: "Analyze this tutorial video about Python functions"

**Processing**:
```
Creating isolated workspace...
✓ Created: .video-input/analysis_20251205_143022/

Copying video file...
✓ Copied to workspace

Extracting frames from video...
✓ Extracted 450 frames (15 min @ 1 fps)
✓ Saved to: .video-input/analysis_20251205_143022/frames/

Transcribing audio...
✓ Transcription complete (3,500 words)
✓ Saved to: .video-input/analysis_20251205_143022/transcription.srt

Matching frames with transcription...
✓ Aligned 450 frames with transcript segments

Generating summary...
✓ Created: .video-input/analysis_20251205_143022/analysis.md
```

**Generated Summary File**:
```markdown
# Video Analysis: Python Functions Tutorial

**Duration**: 15:32
**Resolution**: 1920x1080
**Key Topics**: Functions, Parameters, Return Values, Decorators

## Content Overview

### Frame-Transcription Timeline

| Frame | Time | Transcription |
|-------|------|---------------|
| frame_0001.png | 00:00:00 | Welcome to this tutorial about Python functions. Today we'll cover... |
| frame_0005.png | 00:00:04 | First, let's understand what a function is and why we use them... |
| frame_0010.png | 00:00:09 | As you can see on the screen, a function is defined with the def keywo... |
| frame_0015.png | 00:00:14 | Let's look at some examples of how to use parameters in your functions... |

For complete frame-by-frame matching, see `frame_transcription_map.txt`.

## Key Takeaways
- Functions improve code reusability
- Use descriptive names for clarity
- Decorators add functionality without modifying code

## Full Transcription
[Complete transcription text...]
```

## Dependencies

This skill requires:
- `ffmpeg`: For video frame extraction and metadata
- `ffprobe`: For video metadata extraction (usually comes with ffmpeg)
- `whisper`: For audio transcription (or alternative like `speech_recognition`)
- Sufficient disk space for temporary frames

### Script Features

The `analyze-video.sh` script provides:
- **Error Handling**: Exits immediately if any step fails (`set -euo pipefail`)
- **Dependency Checking**: Verifies all required tools are installed
- **File Verification**: Confirms each output file is created successfully
- **Progress Tracking**: Shows colored output for each step
- **Completion Marker**: Creates `.completed` file when analysis finishes
- **Detailed Logging**: Provides clear feedback at each stage
- **Frame-Transcription Matching**: Automatically correlates frames with spoken content

### How Matching Works

The script automatically matches video frames with transcription timestamps using the following algorithm:

**Frame Timestamp Calculation:**
```
frame_timestamp_seconds = (frame_number - 1) / FPS
```

For example, at 1 FPS:
- `frame_0001.png` = 0 seconds
- `frame_0045.png` = 44 seconds
- `frame_0100.png` = 99 seconds

**Transcription Matching:**
1. Parse the SRT file to extract timestamp ranges and text
2. For each frame, calculate its timestamp based on frame number and FPS
3. Find the SRT segment where the frame timestamp falls within the start/end range
4. Create a correlation map in `frame_transcription_map.txt`
5. Generate a preview timeline table in `analysis.md`

**Performance Optimization:**
- Matches every 5th frame by default to keep output manageable
- Full matching data is stored in `frame_transcription_map.txt`
- Summary table (first 20 matches) is included in `analysis.md`

**Error Handling:**
- Silent videos skip matching gracefully
- Missing SRT segments are marked as "[No transcription at this timestamp]"
- Malformed SRT entries are handled without crashing

## Tips

- Process shorter videos first to test the workflow
- Adjust frame extraction rate based on content type (1 fps for lectures, higher for action)
- Use GPU-accelerated whisper for faster transcription
- Clean up temporary frames after analysis to save space
- Reference the generated markdown file for all future discussions about the video

## Common Use Cases

- **Education**: Understanding tutorial and course content
- **Research**: Analyzing recorded presentations or lectures
- **Content Review**: Summarizing long-form video content
- **Documentation**: Creating written records of video information
- **Accessibility**: Converting video content to text format
- **Discussion Support**: Having context for detailed video conversations

## Important Notes

⚠️ **Workspace Isolation**
- Each analysis creates a unique timestamped directory
- Prevents conflicts when multiple Claude instances run simultaneously
- All analysis artifacts are contained in their own folder
- Working directory: `.video-input/analysis_{timestamp}/` in your current directory (run from project root)

⚠️ **Performance Considerations**
- Large videos may take significant time to process
- Frame extraction can consume substantial disk space
- Transcription accuracy depends on audio quality and language clarity

💡 **Best Practices**
- **Always run the script from your project root directory** - this ensures the analysis is created in the right location
- Ensure video files are accessible and not corrupted
- Use appropriate transcription models (larger models = better accuracy)
- Each analysis is stored in `.video-input/analysis_{timestamp}/` relative to where you run the script
- Clean up old analysis directories periodically to save disk space
- Reference the `analysis.md` file for all future discussions about the video
- The `.video-input` directory is gitignored by default (hidden directory)