ClaudSkills / Content / audio-podcast

AI Multimodal

Quality score: 85/100  ·  Category: Content  ·  Sub-category: audio-podcast
ai:gemini
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

What this skill does

AI Multimodal is a well-rated Claude Code skill (quality score 85/100) in the audio-podcast sub-category. It ships as a SKILL.md file that Claude Code auto-discovers under ~/.claude/skills/ai-multimodal/ and loads when your prompt matches the skill's trigger.

When to invoke it: Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

Who uses this skill

The AI Multimodal skill is built for content creators, marketers, copywriters, SEO professionals, and editorial teams. It is part of the open ClaudSkills registry, a community-curated catalog of 15,000+ capabilities you can install for Claude Code — the Claude CLI agent.

How to install

Free

Manual install (2 steps)

mkdir -p ~/.claude/skills/ai-multimodal
curl -L https://claudskills.com/skills/ai-multimodal/SKILL.md \
  -o ~/.claude/skills/ai-multimodal/SKILL.md

Or just download SKILL.md directly and drop it into ~/.claude/skills/ai-multimodal/. Claude Code auto-discovers it on next session.

Skills live at ~/.claude/skills/ai-multimodal/SKILL.md on macOS/Linux, or %USERPROFILE%\.claude\skills\ai-multimodal\SKILL.md on Windows. See the full install guide for step-by-step instructions.

Pro

One-click install via the desktop app

The ClaudSkills desktop app installs any skill directly into ~/.claude/skills/ with one click — no terminal required. Pro starts at $9/mo or $149 lifetime.

More Content skills

Browse all Content skills in the ClaudSkills registry, or explore these top-rated picks from the same category:

Browse all Content skills → Top 100 skills
Part of ClaudSkills — the open registry for Claude Code skills.  ·  What's New  ·  Install guide  ·  About  ·  llms.txt