Finetune Guide

#3
by rastegar - opened

Hi there
Is there any finetune guide or example code with example dataset?

Since this is based on StyleTTS, it is quite straightforward to finetune. I assume you want to do something like expand language coverage. You happen to be talking to the right guy. I successfully fully reverse-engineered Kokoro, also based on StyleTTS, and expanded it to speak 72 languages. The only issue is you would need to perform additional training runs w/ this model to get more phonemes inside the model so that the vocoder has the ability to actually formualte the phonemes that you want it to -- then you can do things like cross-phoneme mapping and post-hoc tuning and training. Let me now if you have any questions and I can guide you.

@comey113 wow, this is the first time I’ve heard something like this. How were you able to reverse engineer Kokoro and train it in other languages? That’s a big achievement , could you share more details? I have around four to five hundred thousand hours of data and I want to try training experiments with smaller models.

Kitten ML org

@rastegar we will likely release the ability to get custom models w custom voices after our next launch. rn the highest priority is to launch the next model next month. we dont have the bandwidth to release training code rn, unfortunately :( maybe in May or so this will change

Sign up or log in to comment