Transcribe audio or video through the PopiArt runtime baseline for local-first speech-to-text. Use this when the user wants a PopiArt-managed STT path for transcripts, captions,…
Convert text to speech through the PopiArt runtime baseline for multi-model TTS. Use this when the user wants one general text-to-speech entry point without handling upstream…
pyannote.audio is an open-source Python toolkit for speaker diarization built on PyTorch. It provides state-of-the-art pretrained models and pipelines for speech activity…
pydub is a Python library that provides a simple, high-level interface for manipulating audio files. It supports slicing, concatenation, volume adjustment, crossfading, format…
Complete SDK for controlling Reachy Mini robot - head movement, antennas, camera, audio, motion recording/playback.
Real-time audio playback patterns for macOS Apple Silicon. TRIGGERS - audio jitter, tts choppy, sounddevice, afplay jitter, audio architecture, playback glitch, GIL contention…
Query the user's screen recordings, audio, UI elements, and usage analytics via the local Screenpipe REST API at localhost:3030.
Reference skill for Zoom AI Services Scribe. Use after routing to a transcription workflow when handling uploaded or stored media, Build-platform JWT auth, fast mode…
Use Speaches when an agent stack expects OpenAI-style audio endpoints but you want a self-hosted speech backend for transcription, translation, and text-to-speech instead of a…
Music and audio file analysis agent. Produces reproducible Python/Shell analysis pipelines (librosa, pyloudnorm, essentia, madmom, mutagen, ffprobe) for BPM/key/time-signature…
Songwriting craft, AI music generation prompts (Suno focus), parody/adaptation techniques, phonetic tricks, and lessons learned. These are tools and ideas, not rules.
Transcribe speech to text using the Speech framework. Use when implementing live microphone transcription with AVAudioEngine, recognizing pre-recorded audio files, configuring…
Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large.
Spleeter is Deezer's open-source audio source separation library with pretrained models. It can split audio into 2, 4, or 5 stems (vocals, drums, bass, piano, accompaniment) and…
Implement speech-to-text voice input in Blazor applications using Syncfusion SpeechToText component. ALWAYS use this when users need voice input, speech recognition, audio…
Use when analyzing transcription factor (TF) regulatory networks using Dorothea database. Input gene list, identify regulating transcription factors, generate TF-Target network…
Transcribes video audio using WhisperX, preserving original timestamps. Creates JSON transcript with word-level timing. Use when you need to generate audio transcripts for videos.
Get transcripts from any YouTube video — for summarization, research, translation, quoting, or content analysis.
Automate audio/video transcription, meeting notes, subtitle generation, and content processing
Transform podcast transcripts into multiple content assets—blog posts, social snippets, newsletters, and SEO-optimized landing pages—using systematic repurposing workflows.
Explicit-entry skill for Gemini TTS audio. Invoked deliberately via the /tts-duet command (generation) and /tts-duet-setup (configuration); not auto-triggered.
Atomic reference for @panda-video-generator/tts-node: pnpm tts, cli.ts, processNarrationFile — Edge-TTS narration → audio.mp3 + audio.vtt; env vars TTS_*, EDGE_TTS_*, ffmpeg.
Multi-engine text-to-speech skill. Supports Qwen3-TTS local voice cloning, VoiceCraft online TTS, and OpenAI TTS.
Diagnose and fix TwinMind common errors and exceptions. Use when encountering transcription errors, debugging failed requests, or troubleshooting integration issues.
Execute TwinMind primary workflow: Meeting transcription and summary generation. Use when implementing meeting capture, building transcription features, or automating meeting…
Create your first TwinMind meeting transcription and AI summary. Use when starting with TwinMind, testing your setup, or learning basic transcription and summary patterns.
Incident response for TwinMind failures: transcription not starting, audio not captured, sync failures, and calendar disconnect.
Install and configure TwinMind Chrome extension, mobile app, and API access. Use when setting up TwinMind for meeting transcription, configuring calendar integration, or…
Set up local development workflow with TwinMind API integration. Use when building applications that integrate TwinMind transcription, testing API calls locally, or developing…
Monitor TwinMind transcription quality, meeting coverage, action item extraction rates, and memory vault health.
Optimize TwinMind transcription accuracy and speed with Ear-3 model configuration, audio quality tuning, and caching strategies.
Handle TwinMind meeting events including transcription completion, action item extraction, and calendar sync notifications.
Fetch Evolutionary Conservation scores (phyloP, phastCons) and Transcription Factor Binding Sites (TFBS) from the UCSC Genome Browser.
Queries the UniBind database for experimentally validated transcription factor (TF) binding sites. Use when retrieving direct TF-DNA interaction datasets, downloading binding site…
Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running on Apple Silicon. WhatsApp only for now.
Async music / audio-track generation via Venice. Covers the /audio/quote + /audio/queue + /audio/retrieve + /audio/complete lifecycle, lyrics vs instrumental, voice selection,…
Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per…
Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats…
vLLM non-chat inference surfaces — text embeddings (`/v1/embeddings`, `/v2/embed`), reranking/scoring (`/rerank`, `/score`), speech-to-text (`/v1/audio/transcriptions`,…
Use when the user has recorded vocals and wants a processing chain set up. Examples - "set up my vocal chain", "process this vocal", "make the vocal sit in the mix", "give me a…
AI voice agent for handling incoming calls, appointment scheduling, lead qualification, and 24/7 customer service without human intervention.
Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription,…
Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider…
Full voice-to-voice interaction: transcribe user speech, process request, and respond with synthesized speech
Sync, transcribe, and intelligently organize voice memos, audio/video files, and URLs. 同步、转录、智能整理语音备忘录、音视频文件和视频链接。
Transform verbose voice input into structured, token-efficient Claude prompts. Use when cleaning up voice memos, dictation output, or speech-to-text transcriptions that contain…
Prepare audio-optimized content for text-to-speech rendering. Generate recording scripts, pronunciation guides, and pacing-marked text for podcasts, video voiceovers, and audio…
Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns.
Perform offline speech recognition across 20+ languages with Vosk. Provides compact models, zero-latency streaming transcription, and bindings for Python, Node.js, Java, C#, and…
Generate text-to-video with Wan 2.7 (Wan-AI's flagship motion model) on RunComfy. Documents Wan 2.7's strengths (multi-reference conditioning, audio-driven lip-sync via…
Configure WaveCap LLM-based transcription correction. Use when the user wants to enable/disable LLM correction, change models, tune prompts, or optimize correction quality on…
OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification.
Processes audio files from an S3 bucket using Whisper large-v3, splitting recordings into 30-second chunks with ffmpeg before transcription.
Runs OpenAI Whisper models locally via whisper.cpp with GGML quantized weights for CPU-efficient transcription.
Streams audio from PulseAudio or ALSA devices into whisper.cpp for real-time speech-to-text with word-level timestamps.
Enhances OpenAI Whisper transcription output with speaker diarization using pyannote.audio pipeline and speechbrain embeddings.
Use when the user wants to transcribe, caption, subtitle, batch process, or convert speech to text from local audio/video files using faster-whisper.
Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews;…
WhisperX extends OpenAI Whisper with batched inference for 70x realtime transcription, phoneme-based word-level timestamp alignment via wav2vec2, voice activity detection, and…
Thin orchestrator for the end-to-end video localization pipeline. Routes to the four focused sub-skills — /wjs-transcribing-audio, /wjs-translating-subtitles, /wjs-dubbing-video,…