Audio Podcast

For the full experience including quality scoring and one-click install features for each skill — upgrade to Pro.

audio-studio

Studio audio : mixage, mastering, export pro. Use when: podcast, audiobook, jingle, mixer, mastering, montage audio, voiceover+musique.

mastering-engineer

Guides audio mastering for streaming platforms including loudness optimization and tonal balance. Use when the user has approved tracks and wants to master audio files.

mix-engineer

Polishes raw Suno audio by processing per-stem WAVs (vocals, backing_vocals, drums, bass, guitar, keyboard, strings, brass, woodwinds, percussion, synth, other) with targeted…

ops-ar

A&R any record like a dance-pop label owner + master producer. Single track, batch, or full Gmail-inbox demo sweep — runs the audio-ar analysis stack (BPM/key/loudness/structure,…

release-director

Coordinates album release including QA, distribution prep, and platform uploads. Use when mastering and album art are complete and the user is ready to release.

sheet-music-publisher

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

sheet-music-publisher

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

transcribe-audio-local

Локальная транскрибация аудиофайлов без отправки в облако. Используй когда пользователь просит транскрибировать запись, расшифровать аудио, сделать конспект встречи, преобразовать…

ai-multimodal

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object…

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding,…

apple-productivity

Access macOS productivity apps (Calendar, Contacts, Mail, Messages, Reminders, Voice Memos). Use when user asks about calendar events, contacts, emails, iMessages, reminders, or…

audio-design

Game audio design patterns for creating sound effects and UI audio. Use when designing sounds for games, writing AI audio prompts (ElevenLabs, etc.), creating feedback sounds, or…

audio-mix-maker

Mix a music / audio track onto an existing video via ffmpeg. Three modes: replace (drop original audio), overlay (mix both audible), duck (sidechain-compressor lowers music when…

automating-voice-memos

Automates Apple Voice Memos (Mac Catalyst, no dictionary) via JXA using filesystem/SQLite access and System Events UI scripting.

bio-motif-search

Find patterns, motifs, and subsequences in biological sequences using Biopython. Use when searching for transcription factor binding sites, regulatory elements, or any se — from…

bio-transcription-translation

Transcribe DNA to RNA and translate to protein using Biopython. Use when converting between DNA, RNA, and protein sequences, finding ORFs, or using alternative codon tabl — from…

gemini-audio

Guide for implementing Google Gemini API audio capabilities - analyze audio with transcription, summarization, and understanding (up to 9.5 hours), plus generate speech with…

music-prompt

Write prompts for 10+ frontier AI music generators (Suno v5.5, Udio v4, Google Lyria 3 Pro, ElevenLabs Music, Stable Audio 2.5, MusicGen, Tencent SongGeneration, Sonauto v2,…

transcribe-maker

Transcribe audio / video to SRT / WebVTT / JSON / plain text via OpenAI Whisper. Auto-detects language or accepts --lang ISO-639-1 hint. ~$0.006/min.

aside

End-to-end aside session processing — transcribe, align memo + transcript, distill into a structured vault note via Enzyme.

asr-transcribe-to-text

Transcribes audio and video files to text using Qwen3-ASR. Supports two modes — local MLX inference on macOS Apple Silicon (no API key, 15-27x realtime) and remote API via…

ayeeeen-dev

Build features for Ayeeeen (عين), an AI-powered accessibility app for blind users. Use when: implementing screens, adding ML features, creating audio-first UX flows, handling…

conference-transcribe

Transcribe a multi-talk conference livestream or long YouTube video into separate per-talk transcripts.

do-voice-recording

Turn text into a spoken-audio file (OGG/Opus) from any project or directory. Kokoro local voice with OpenAI tts-1 cloud fallback.

download-video

Download videos from social media URLs (X/Twitter, YouTube, Instagram, TikTok, etc.) using yt-dlp. Use when saving a video locally, extracting content for transcription, or…

elevenlabs-transcribe

Transcribes audio/video files using ElevenLabs Scribe v2 API. Use when transcribing audio files, generating transcripts, or converting speech to text.

heygen-avatar

Create a persistent HeyGen avatar — a reusable face + voice identity for the agent, the user, or any named character — powered by HeyGen Avatar V technology.

join-meeting

AgentCall (agentcall.dev) — Join a video meeting (Google Meet, Teams, Zoom) as an AI bot with voice and visual presence.

resumen-reuniones

Processes meeting transcriptions and calendar data to generate a structured summary with key points, agreements, attendance table, and Word document deliverable.

transcribe-video

Generate subtitles (SRT/VTT) and plain text transcripts from video or audio files using AWS Transcribe.

wiki-extract-podcast

Extract podcast episodes from RSS and transcribe via Whisper or AssemblyAI into raw/. Use for podcast URLs or episode links.

wiki-extract-youtube

Extract YouTube transcripts, captions, and metadata into raw/. Uses yt-dlp or pytube. Supports audio transcription when no captions exist.

audio-loop

Use this skill whenever a user has an audio file (.wav/.mp3/.flac/etc.) that needs to loop as background on a website or web page — hero ambience, landing-page atmosphere,…

2000s-visualization-expert

Expert in 2000s-era music visualization (Milkdrop, AVS, Geiss) and modern WebGL implementations. Specializes in Butterchurn integration, Web Audio API AnalyserNode FFT data, GLSL…

differential-region-analysis

The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based…

TF-differential-binding

The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R.

9router-stt

Speech-to-text via 9Router /v1/audio/transcriptions using OpenAI Whisper / Groq / Gemini / Deepgram / AssemblyAI / NVIDIA / HuggingFace models.

abridge-core-workflow-a

Implement Abridge ambient clinical documentation capture-to-note pipeline. Use when building the primary encounter workflow: audio capture, real-time transcription, AI note…

ace-step

Generate, inpaint, and outpaint music with ACE Step on RunComfy via the `runcomfy` CLI. ACE Step is StepFun-AI's open-weights music foundation model — tag-driven composit — from…

ace-step

Generate, inpaint, and outpaint music with ACE Step on RunComfy via the `runcomfy` CLI. ACE Step is StepFun-AI's open-weights music foundation model — tag-driven composit — from…

ace-step

Generate, inpaint, and outpaint music with ACE Step on RunComfy via the `runcomfy` CLI. ACE Step is StepFun-AI's open-weights music foundation model — tag-driven composit — from…

add-voice-transcription

Add voice message transcription to Deus using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.

agent-face

Gives an AI agent a talking, animated, lip-syncing voice face. You speak, it transcribes (in-browser Whisper), a pluggable brain replies, the reply is spoken back and drives a…

ai-audio-generation

AI audio generation for agents through Image Skill's zero-setup hosted creative runtime. Use when a prompt should become music, sound, or audio without provider credentials,…

ai-music

Generate AI music on RunComfy via the `runcomfy` CLI — a smart router across the music-model catalog.

ai-music

Generate AI music on RunComfy via the `runcomfy` CLI. Routes across the music-model catalog to ElevenLabs AI Music Generation (premium 44.1 kHz stereo vocal tracks, 5 s–5 min) and…

ai-music

Generate AI music on RunComfy via the `runcomfy` CLI. Routes to ElevenLabs Music (premium vocal, $0.0083/s) or ACE Step / 1.5 (open-weights, $0.0002–0.0003/s, multilingual), plus…

analyze-transcription

Use when you need to determine whether an Instagram reel video requires frame-by-frame extraction based on its transcription text.

Analyze videos with frame extraction and audio context in Claude Code

Give Claude Code a video perception layer that extracts frames, transcribes audio, and lets Claude answer questions about local videos or YouTube URLs.

voice-ai-engine-development

Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-pro — from…

apple-ml

Apple オンデバイス機械学習フレームワークリファレンス。 Core ML / Create ML / Vision / Natural Language / Speech。 MLModel, MLModelConfiguration, MLMultiArray, MLComputeUnits, MLImageClassifier,…

arboreto

Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3).

arboreto

Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3).

arboreto

Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3).

archive-search

Essential pre-search guide — load BEFORE calling search_transcribed or search_metadata. Use whenever the user wants to search, find, look up, or discover documents, people,…

assemblyai-audio-intelligence-agent

Extract structured intelligence from audio using the AssemblyAI API with sentiment analysis, entity detection, topic modeling, and auto-chapter generation.

assemblyai-common-errors

Diagnose and fix AssemblyAI common errors and exceptions. Use when encountering AssemblyAI errors, debugging failed transcriptions, or troubleshooting streaming and LeMUR issues.

assemblyai-core-workflow-a

Execute AssemblyAI primary workflow: async transcription with audio intelligence. Use when transcribing audio/video files, enabling speaker diarization, sentiment analysis, entity…

assemblyai-core-workflow-b

Execute AssemblyAI streaming transcription and LeMUR workflows. Use when implementing real-time speech-to-text, live captions, voice agents, or LLM-powered audio analysis with…

assemblyai-cost-tuning

Optimize AssemblyAI costs through model selection, feature budgeting, and usage monitoring. Use when analyzing AssemblyAI billing, reducing transcription costs, or implementing…

Categories

Use cases

Popular tags

Learn

Site