---
name: parakeet
description: >
  Local speech-to-text via Handy app (push-to-talk) and NeMo CLI scripts.
  Parakeet V3: 25 languages, auto-detection, ~30x realtime on M4 Max, 6% WER.
  This skill should be used when transcribing audio files or dictating voice input.
argument-hint: [audio-file | "dictate" | "check"]
tools: [Bash, Read, Write]
---

# Parakeet Dictation Skill

Local speech-to-text powered by NVIDIA Parakeet TDT 0.6B V3 (~600MB model, 100% offline).

## Two Modes

### 1. Handy App (Primary — Push-to-Talk into Any Text Field)

[Handy](https://handy.computer/) is a free, open-source Tauri app (Rust + React) providing
push-to-talk dictation with Parakeet V3 built in. Inference via
[transcribe-rs](https://github.com/cjpais/transcribe-rs) (ONNX Runtime, int8 quantized).

```bash
brew install --cask handy
```

- **Default hotkey**: ⌥Space (Option-Space) on macOS, Ctrl-Space on Windows/Linux
- **Modes**: Push-to-talk (hold) or toggle (press to start/stop)
- Select **Parakeet V3** in Settings → Models (auto-downloads ~478MB)
- Grant microphone + accessibility permissions
- Includes VAD (Silero), model management UI
- **Additional models**: Whisper (Small/Medium/Turbo/Large), Moonshine, SenseVoice
- Models stored at `~/Library/Application Support/com.pais.handy/models/`

### 2. CLI Scripts (Claude Code File Transcription & Terminal Dictation)

CLI scripts remain for headless/terminal use within Claude Code. These use NeMo/PyTorch.

## Performance

| System | Speed | Engine |
|--------|-------|--------|
| **Handy (M4 Max)** | ~30x realtime | transcribe-rs / ONNX int8 |
| **Handy (Zen 3)** | ~20x realtime | transcribe-rs / ONNX int8 |
| **Handy (Skylake i5)** | ~5x realtime | transcribe-rs / ONNX int8 |
| **NeMo CLI (MPS)** | Varies | NeMo / PyTorch |

- **Accuracy**: 6.05% WER (Word Error Rate)
- **Languages**: 25 European languages with automatic detection (no prompting)
- **Privacy**: 100% local processing, no cloud API
- **License**: CC BY 4.0 (model), MIT (Handy app)

## Commands

### Transcribe Audio File

```bash
/parakeet path/to/audio.wav
/parakeet ~/recordings/interview.mp3
/parakeet meeting.m4a
```

Supported formats: `.wav`, `.mp3`, `.m4a`, `.flac`, `.ogg`, `.aac`

### Live Dictation (Terminal)

```bash
/parakeet
/parakeet dictate
```

Record from microphone until Enter is pressed, then transcribe.

### Check Installation

```bash
/parakeet check
```

Verify Parakeet is properly installed and model can load.

## Setup

### Handy (Push-to-Talk UI)

```bash
brew install --cask handy
```

Launch from Applications, select Parakeet V3 model, configure hotkey.

### CLI Scripts (Prerequisites)

1. **Parakeet Dictate repo** at `~/Programming/parakeet-dictate/` with Python venv
2. **Install dependencies**:
   ```bash
   cd ~/Programming/parakeet-dictate
   uv venv && uv pip install -r requirements.txt
   ```
3. **(Optional) Set custom path**: `export PARAKEET_HOME=/path/to/parakeet-dictate`

## Implementation

When this skill is invoked:

1. **For audio files**: Run the transcription script
   ```bash
   cd ~/.claude/skills/parakeet/scripts && \
   ${PARAKEET_HOME:-~/Programming/parakeet-dictate}/.venv/bin/python transcribe.py "<filepath>"
   ```

2. **For live dictation**: Run the dictation script
   ```bash
   cd ~/.claude/skills/parakeet/scripts && \
   ${PARAKEET_HOME:-~/Programming/parakeet-dictate}/.venv/bin/python dictate.py
   ```

3. **For checking setup**: Run the check script
   ```bash
   cd ~/.claude/skills/parakeet/scripts && \
   ${PARAKEET_HOME:-~/Programming/parakeet-dictate}/.venv/bin/python check_setup.py
   ```

## 3. Server Mode (OpenAI-Compatible STT for Takopi)

An OpenAI-compatible transcription server for integration with Takopi and other
API clients. Exposes `POST /v1/audio/transcriptions` wrapping the NeMo CLI.

```bash
uv run --with fastapi,uvicorn,python-multipart \
  ~/.claude/skills/parakeet/scripts/parakeet_server.py --port 8384
```

Takopi config (`~/.takopi/takopi.toml`):
```toml
voice_transcription_base_url = "http://localhost:8384/v1"
voice_transcription_api_key = "local"
voice_transcription_model = "parakeet-tdt-0.6b"
```

Endpoints: `/v1/audio/transcriptions` (POST, multipart), `/v1/models` (GET),
`/health` (GET). Supports `response_format`: `json`, `text`, `verbose_json`.
Accepts `.wav`, `.mp3`, `.m4a`, `.flac`, `.ogg`, `.aac`, `.oga`, `.webm`.
50 MB upload limit. 120s transcription timeout.

## Model Caches

| System | Cache Location | Size | Engine |
|--------|---------------|------|--------|
| **Handy** | `~/Library/Application Support/com.pais.handy/models/` | ~478MB | transcribe-rs (ONNX int8) |
| **NeMo CLI** | `~/.cache/nemo/` | ~1.2GB | NeMo / PyTorch |

Model caches are separate. Handy's Parakeet V3 int8 model structure:
```
parakeet-tdt-0.6b-v3-int8/
├── encoder-model.int8.onnx
├── decoder_joint-model.int8.onnx
├── nemo128.onnx (audio preprocessor)
└── vocab.txt
```

## Troubleshooting

### "No module named nemo"
Use the Parakeet virtual environment. Scripts automatically use the correct Python.

### "MPS not available"
Apple Silicon Metal acceleration requires PyTorch 2.0+. Falls back to CPU automatically.

### "Permission denied: microphone"
Grant microphone access in System Preferences → Privacy & Security → Microphone.

### Model download slow
The Parakeet model downloads on first use (~478MB for Handy, ~1.2GB for NeMo). Subsequent runs use cache.

## Configuration

| Variable | Default | Description |
|----------|---------|-------------|
| `PARAKEET_HOME` | `~/Programming/parakeet-dictate` | Parakeet Dictate installation path |

## Dependencies

**Handy**: `brew install --cask handy` (standalone, no other deps)

**CLI scripts** require:
- **Parakeet Dictate repo** at `$PARAKEET_HOME` (default: `~/Programming/parakeet-dictate`)
- **Python virtual environment** at `$PARAKEET_HOME/.venv`
- **NeMo toolkit** with ASR support (`nemo_toolkit[asr]>=2.0.0`)
- **PyTorch 2.0+** (for MPS/CUDA acceleration)
- **soundfile** and **sounddevice** for audio handling
