Pocket TTS - MLX Weights
MLX-converted weights for Kyutai's Pocket TTS model. Optimized for Apple Silicon inference via MLX (Python) and MLX-Swift.
Weight Variants
| Directory | Size | Description | RTF (M2 Air) |
|---|---|---|---|
bf16/ |
224MB | bfloat16 baseline | ~3x realtime |
int8/ |
148MB | 8-bit quantized FlowLM, bf16 Mimi | ~5x realtime |
int4/ |
107MB | 4-bit quantized FlowLM, bf16 Mimi | ~6x realtime |
All variants include FlowLM + Mimi decoder in a single unified mlx_model.safetensors file.
Voice Embeddings
8 pre-extracted voice embeddings from the Kyutai release:
voice/alba.safetensorsvoice/azelma.safetensorsvoice/cosette.safetensorsvoice/eponine.safetensorsvoice/fantine.safetensorsvoice/javert.safetensorsvoice/jean.safetensorsvoice/marius.safetensors
See GitHub repo for full source code. (coming soon)
Source
Converted from kyutai/pocket-tts official weights.
- Downloads last month
- 114
Hardware compatibility
Log In
to add your hardware
Quantized