Kazakh-VoxCPM-LoRA

🇰🇿 Overview

This repository hosts a LoRA (Low-Rank Adaptation) model specifically optimized for the Kazakh language, built upon the VoxCPM 1.5 architecture. This research project aims to bridge the gap in high-quality Kazakh speech synthesis, offering a solution that excels in both standard TTS and Zero-shot Voice Cloning while retaining the base model's proficiency in Chinese and English.

🚀 Performance Highlights

Native Phoneme Mastery: Precision handling of unique Kazakh phonemes: ә, ғ, қ, ң, ө, ұ, ү, һ, і.
Superior Prosody: Achieved a loss/stop of 0.003-0.005, ensuring natural pauses and rhythmic accuracy in long-form text.
Advanced Cloning: Supports high-fidelity voice cloning from as little as 3 seconds of reference audio.
Seamless Tri-lingualism: Integrated support for code-switching across Kazakh, English, and Chinese.

📊 Training Specifications

Base Model: openbmb/VoxCPM1.5
Dataset: 66.1 hours of high-quality Kazakh speech (Source: issai/KazakhTTS).
Parameters: Step: 4160 | Epoch: 1.84 | Rank: 32 | Alpha: 16.
Final Metrics: loss/diff: ~0.644 | loss/stop: ~0.004.

🛠️ Implementation Guide

This model supports dynamic hot-swapping. You can enable Kazakh support by setting lora_enabled to True.

For a complete interactive web application and detailed inference scripts, please refer to our GitHub repository: 👉 voxcpm-kazakh-tts

This web application supports:

Interactive Synthesis: Real-time Kazakh TTS.
Voice Cloning: Custom voice synthesis using your own reference audio.
Easy Deployment: Ready to run via Gradio.

⚖️ License & Acknowledgements

This model is released under the Apache License 2.0. Special thanks to the ISSAI team for providing the KazakhTTS dataset.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ErnarBahat/VoxCPM-KazakhTTS-Lora

Base model

openbmb/MiniCPM4-0.5B

Finetuned

openbmb/VoxCPM1.5