Elna Company MPN encoding patterns, suffix decoding, and handler guidance. Use when working with Elna audio-grade aluminum electrolytic capacitors and supercapacitors.
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2 that delivers up to 4x faster transcription with lower memory usage.
faster-whisper is SYSTRAN’s high-performance reimplementation of OpenAI Whisper on top of CTranslate2.
Definitive reference for FFmpeg and ASS/SSA animation timing units, optimal durations, and best practices.
Normalizes audio loudness to broadcast standards using FFmpeg loudnorm filter with EBU R128 two-pass analysis.
Complete audio encoding and normalization system. PROACTIVELY activate for: (1) Audio codec selection (AAC, MP3, Opus, FLAC), (2) Loudness normalization (EBU R128, loudnorm), (3)…
Transcodes and processes audio files using the FFmpeg CLI and libavcodec library. Supports batch format conversion, loudness normalization via EBU R128, and metadata extraction…
Complete subtitle and caption system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20).
Analyze Field Labs coaching transcription data, calculate session metrics, and generate daily summaries.
Migrate to Fireflies.ai from other meeting transcription platforms or legacy recording systems. Use when switching from Otter.ai, Rev, or custom transcription to Fireflies, or…
Detect transcription factor binding sites through footprinting analysis in ATAC-seq data using TOBIAS.
One-time bootstrap for Kokoro TTS engine, Telegram bot, and BotFather setup. TRIGGERS - setup tts, install kokoro, botfather, bootstrap tts-tg-sync, configure telegram bot, full…
Expert-level Google Cloud CLI (gcloud) skill for managing GCP resources. Use when working with "gcloud commands", "cloud run deploy", "alloydb", "cloud sql", "workload identity…
Gemini TTS 命令列工具使用指南,涵蓋單句與批次文字轉語音、列出聲音、合併 WAV、stdout 輸出、API key 設定、快取與併發等。當使用者詢問 gemini-tts、Gemini TTS CLI、list-voices、merge、GEMINI_API_KEY、文字轉語音或相關參數時使用。
Invoke Google Gemini for video understanding and analysis using the Python google-genai SDK. Supports gemini-3-pro-preview and gemini-2.5-flash for video analysis, transcription,…
narration-scripts.json의 대본을 edge-tts로 MP3 파일로 변환하고 mutagen으로 재생 시간을 측정하여 durations.json을 갱신합니다. 사용 시점: TTSAgent가 각 슬라이드의 나레이션 음성 파일을 생성할 때 호출합니다.
Create audio-reactive GLSL visualizers for Bice-Box. Provides templates, audio uniforms (iRMSOutput, iRMSInput, iAudioTexture), coordinate patterns, and common shader functions.
Use when configuring audio bus hierarchies in Godot, setting up AudioStreamPlayer pooling for performance, implementing 3D spatial audio, adding audio effect chains like reverb…
Troubleshoot common Granola errors — audio capture failures, transcription issues, calendar sync problems, and integration errors. Platform-specific fixes for macOS and Windows.
Incident response procedures for Granola meeting capture failures and outages. Use when meetings aren't recording, transcription fails mid-meeting, integrations stop syncing, or…
Optimize Granola transcription accuracy, note quality, and processing speed. Use when improving transcription quality, reducing processing time, optimizing templates for better AI…
Execute Groq secondary workflows: audio transcription (Whisper), vision, text-to-speech, and batch model evaluation.
Use when the user asks about audio in Higgsfield videos, needs to add dialogue or lip-sync, wants sound effects or ambient sound in generated video, asks about music or BGM in…
Howler.js is a JavaScript audio library for the modern web that defaults to the Web Audio API with an HTML5 Audio fallback.
Create video compositions, animations, title cards, overlays, captions, voiceovers, audio-reactive visuals, and scene transitions in HyperFrames HTML.
Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transparent overlays (u2net).
Verbeter een lecture-transcript: confidence-aware LLM-correctie (ASR-conf x LLM-conf decision matrix) van typos/eigennamen/boektitels en topic-paragraaf herstructurering in één…
Insanely Fast Whisper is a CLI tool that transcribes audio at extreme speeds using OpenAI Whisper models with Hugging Face Transformers, Flash Attention 2, and batched inference.
Interview management, transcription workflows, and source note-taking for journalists. Use when preparing for interviews, managing recordings, transcribing audio/video, organizing…
Expert knowledge for iOS audio processing, pitch detection algorithms (HPS, YIN, FFT), DSP implementation, and AudioKit integration.
Transcribe speech using International Phonetic Alphabet and analyze sound systems including phonotactics and phonological rules
Access JASPAR database for transcription factor binding profiles (matrices), collections, and species via REST API.
Creates and bootstraps Knowledge Graph projects from video transcripts. Extracts entities (people, organizations, concepts) and relationships into searchable graphs.
Generate high-quality text-to-speech audio using Kokoro, a neural TTS model running locally on Apple Silicon via MLX.
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (wi — from…
Insert a media object on the intended slide, optionally configure click behavior, and verify the requested result before leaving the slide.
librosa is a Python library for audio and music analysis. It provides tools for feature extraction, spectral analysis, beat tracking, onset detection, and audio visualization,…
Zoom Meeting SDK for Linux - C++ headless meeting bots with raw audio/video access, transcription, recording, and AI integration for server-side automation
Universal LLM API client for 142+ providers with native bindings for 11 languages. Use when writing code that calls LLM APIs via liter-llm in Python, TypeScript, Rust, Go, Java,…
Monitors live audio streams from RTMP, HLS, or Icecast sources using FFmpeg stream capture and real-time chunked transcription via Deepgram's streaming API or Whisper.cpp.
Build self-hosted speech-to-text APIs using Hugging Face models (Whisper, Wav2Vec2) and create LiveKit voice agent plugins.
Gere texto para fala local em português brasileiro com Piper ou Kokoro. Use quando o usuário quiser TTS offline, leitura em pt-BR, geração rápida de áudio, narração natural, ou…
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (wi — from…
Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and…
Use when the user is finishing a track and wants to check it's ready to send to a mastering engineer or for self-mastering.
Content analysis for video and audio — YouTube, TikTok, podcasts, audio files. Transcription-first pipeline (captions API, user transcript, or Whisper opt-in).
Patterns for building multimodal AI applications that combine text, images, audio, and video. Covers vision APIs, audio transcription, and unified pipelines.
PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen).
Given a local video or video URL, downloads the media if needed, extracts slide frames and key moments, transcribes the audio, and writes a Markdown timeline that interleaves…
OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification.
Tools, patterns, and utilities for generating professional music with realistic instrument sounds. Write custom compositions using music21 or learn from existing MIDI files.
Optimize and format prompts specifically for AI music generation platforms like Suno and Udio, including platform-specific syntax and tag optimization
Speech-to-text via OmniRoute using OpenAI /v1/audio/transcriptions format with auto-fallback across Whisper, AssemblyAI, Deepgram, Azure STT.
OpenAI API integration for building AI-powered applications. Use when working with OpenAI's Chat Completions API, Python SDK (openai), TypeScript SDK (openai), tool use/function…
API-based speech-to-text transcription through OpenAI. No local model downloads, no GPU, no Python ML stack — just an API key and a shell script.
Analisi acida e basata sui fatti della stabilità delle release di OpenClaudio. Da usare prima di ogni 'openclaw update' per evitare regressioni o leak di log.
Production pipeline for interactive and generative visual art using p5.js. Creates browser-based sketches, generative art, data visualizations, interactive experiences, 3D scenes,…
Pedalboard is a Python library built by Spotify for working with audio: reading, writing, rendering, and adding studio-quality effects.
Sound systems of human language -- phoneme inventories, the International Phonetic Alphabet, articulatory and acoustic phonetics, phonological rules, suprasegmental features…
Run fast, high-quality neural text-to-speech locally with Piper. Supports 20+ languages with compact ONNX voice models, no cloud API required, and produces natural-sounding speech…