Pocket TTS - MLX Weights

MLX-converted weights for Kyutai's Pocket TTS model. Optimized for Apple Silicon inference via MLX (Python) and MLX-Swift.

Weight Variants

Directory Size Description RTF (M2 Air)
bf16/ 224MB bfloat16 baseline ~3x realtime
int8/ 148MB 8-bit quantized FlowLM, bf16 Mimi ~5x realtime
int4/ 107MB 4-bit quantized FlowLM, bf16 Mimi ~6x realtime

All variants include FlowLM + Mimi decoder in a single unified mlx_model.safetensors file.

Voice Embeddings

8 pre-extracted voice embeddings from the Kyutai release:

  • voice/alba.safetensors
  • voice/azelma.safetensors
  • voice/cosette.safetensors
  • voice/eponine.safetensors
  • voice/fantine.safetensors
  • voice/javert.safetensors
  • voice/jean.safetensors
  • voice/marius.safetensors

See GitHub repo for full source code. (coming soon)

Source

Converted from kyutai/pocket-tts official weights.

Downloads last month
114
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support