Kazakh-VoxCPM-LoRA
๐ฐ๐ฟ Overview
This repository hosts a LoRA (Low-Rank Adaptation) model specifically optimized for the Kazakh language, built upon the VoxCPM 1.5 architecture. This research project aims to bridge the gap in high-quality Kazakh speech synthesis, offering a solution that excels in both standard TTS and Zero-shot Voice Cloning while retaining the base model's proficiency in Chinese and English.
๐ Performance Highlights
- Native Phoneme Mastery: Precision handling of unique Kazakh phonemes: ำ, า, า, าฃ, ำฉ, าฑ, าฏ, าป, ั.
- Superior Prosody: Achieved a
loss/stopof 0.003-0.005, ensuring natural pauses and rhythmic accuracy in long-form text. - Advanced Cloning: Supports high-fidelity voice cloning from as little as 3 seconds of reference audio.
- Seamless Tri-lingualism: Integrated support for code-switching across Kazakh, English, and Chinese.
๐ Training Specifications
- Base Model: openbmb/VoxCPM1.5
- Dataset: 66.1 hours of high-quality Kazakh speech (Source: issai/KazakhTTS).
- Parameters: Step: 4160 | Epoch: 1.84 | Rank: 32 | Alpha: 16.
- Final Metrics:
loss/diff: ~0.644 |loss/stop: ~0.004.
๐ ๏ธ Implementation Guide
This model supports dynamic hot-swapping. You can enable Kazakh support by setting lora_enabled to True.
For a complete interactive web application and detailed inference scripts, please refer to our GitHub repository: ๐ voxcpm-kazakh-tts
This web application supports:
- Interactive Synthesis: Real-time Kazakh TTS.
- Voice Cloning: Custom voice synthesis using your own reference audio.
- Easy Deployment: Ready to run via Gradio.
โ๏ธ License & Acknowledgements
This model is released under the Apache License 2.0. Special thanks to the ISSAI team for providing the KazakhTTS dataset.