Kazakh-VoxCPM-LoRA

๐Ÿ‡ฐ๐Ÿ‡ฟ Overview

This repository hosts a LoRA (Low-Rank Adaptation) model specifically optimized for the Kazakh language, built upon the VoxCPM 1.5 architecture. This research project aims to bridge the gap in high-quality Kazakh speech synthesis, offering a solution that excels in both standard TTS and Zero-shot Voice Cloning while retaining the base model's proficiency in Chinese and English.

๐Ÿš€ Performance Highlights

  • Native Phoneme Mastery: Precision handling of unique Kazakh phonemes: ำ™, า“, า›, าฃ, ำฉ, าฑ, าฏ, าป, ั–.
  • Superior Prosody: Achieved a loss/stop of 0.003-0.005, ensuring natural pauses and rhythmic accuracy in long-form text.
  • Advanced Cloning: Supports high-fidelity voice cloning from as little as 3 seconds of reference audio.
  • Seamless Tri-lingualism: Integrated support for code-switching across Kazakh, English, and Chinese.

๐Ÿ“Š Training Specifications

  • Base Model: openbmb/VoxCPM1.5
  • Dataset: 66.1 hours of high-quality Kazakh speech (Source: issai/KazakhTTS).
  • Parameters: Step: 4160 | Epoch: 1.84 | Rank: 32 | Alpha: 16.
  • Final Metrics: loss/diff: ~0.644 | loss/stop: ~0.004.

๐Ÿ› ๏ธ Implementation Guide

This model supports dynamic hot-swapping. You can enable Kazakh support by setting lora_enabled to True.

For a complete interactive web application and detailed inference scripts, please refer to our GitHub repository: ๐Ÿ‘‰ voxcpm-kazakh-tts

This web application supports:

  • Interactive Synthesis: Real-time Kazakh TTS.
  • Voice Cloning: Custom voice synthesis using your own reference audio.
  • Easy Deployment: Ready to run via Gradio.

โš–๏ธ License & Acknowledgements

This model is released under the Apache License 2.0. Special thanks to the ISSAI team for providing the KazakhTTS dataset.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ErnarBahat/VoxCPM-KazakhTTS-Lora

Finetuned
openbmb/VoxCPM1.5
Finetuned
(4)
this model

Dataset used to train ErnarBahat/VoxCPM-KazakhTTS-Lora