|
|
--- |
|
|
base_model: |
|
|
- aoi-ot/VibeVoice-Large |
|
|
tags: |
|
|
- text-to-speech |
|
|
- tts |
|
|
- lora |
|
|
- sft |
|
|
- full-finetune |
|
|
- vibevoice |
|
|
language: |
|
|
- hu |
|
|
--- |
|
|
# VibeVoice_7B_Hun_v2 |
|
|
This is my newest finetuned VibeVoice 7B (Large) model tailored to Hungarian language. |
|
|
I trained LoRA for the LLM module, performed a full-finetune on the Diffusion head modules, and merged each of them into the base model. |
|
|
|
|
|
To finetune the model I used the [following code](https://github.com/voicepowered-ai/VibeVoice-finetuning). |
|
|
|
|
|
Thank you for [JPGallegoar](https://github.com/jpgallegoar-vpai) for that amazing VibeVoice trainer! |
|
|
|
|
|
## Inference |
|
|
For inference, you can use |
|
|
- [this Comfyui node](https://github.com/Enemyx-net/VibeVoice-ComfyUI) |
|
|
- Demo codes on [VibeVoice Community's repository](https://github.com/vibevoice-community/VibeVoice) |
|
|
|
|
|
## Examples |
|
|
These examples were made with 4bit inference. One can get even better results without quantization. |
|
|
|
|
|
#### Sample 1 |
|
|
``` |
|
|
"Az utcák lassan megteltek emberekkel, ahogy a város ébredezett. |
|
|
A kávézók teraszain gőzölgő csészék mellett beszélgettek az emberek, miközben a villamos csilingelve gördült el a sarkon. |
|
|
A levegőben friss péksütemény illata keveredett a tavaszi széllel. |
|
|
Minden arra utalt, hogy egy nyugodt, szép nap veszi kezdetét." |
|
|
``` |
|
|
<div style="display: flex; gap: 20px; align-items: center;"> |
|
|
<div style="flex: 1;"> |
|
|
<strong>Fine-tuned Model</strong> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_hun_v2/resolve/main/assets/test_withmodel_1.flac" style="width: 100%;"></audio> |
|
|
</div> |
|
|
<div style="flex: 1;"> |
|
|
<strong>Base Model</strong> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_hun_v2/resolve/main/assets/test_no_model_1.flac" style="width: 100%;"></audio> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
#### Sample 2 |
|
|
``` |
|
|
"Őrült űző üldöző őz őrjöngő őrült őrzőjével ügető ürge üvöltözött. |
|
|
Örvénylő örömök örökös özönével ölelő ösvényeken ődöngő űrlények ütköztek össze önérzetesen." |
|
|
``` |
|
|
<div style="display: flex; gap: 20px; align-items: center;"> |
|
|
<div style="flex: 1;"> |
|
|
<strong>Fine-tuned Model</strong> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_hun_v2/resolve/main/assets/test_withmodel_2.flac" style="width: 100%;"></audio> |
|
|
</div> |
|
|
<div style="flex: 1;"> |
|
|
<strong>Base Model</strong> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_hun_v2/resolve/main/assets/test_no_model_2.flac" style="width: 100%;"></audio> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
#### Sample 3 |
|
|
``` |
|
|
"Csapzott cserecsapat cserebogár csapongott cserepes cseresznyecsokrok csücskében, |
|
|
s közben csipcsup csipogással csipkedte csípős csipkebokor csúcsát." |
|
|
``` |
|
|
<div style="display: flex; gap: 20px; align-items: center;"> |
|
|
<div style="flex: 1;"> |
|
|
<strong>Fine-tuned Model</strong> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_hun_v2/resolve/main/assets/test_withmodel_3.flac" style="width: 100%;"></audio> |
|
|
</div> |
|
|
<div style="flex: 1;"> |
|
|
<strong>Base Model</strong> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_hun_v2/resolve/main/assets/test_no_model_3.flac" style="width: 100%;"></audio> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
**Important Notes:** This model is created as part of a fan project for research purposes only and is not intended for commercial use. |
|
|
The dataset I used might contain material, which are protected by copyright. Users utilize the model at their own risk. |
|
|
Users are obligated to comply with copyright laws and applicable regulations. |
|
|
The model has been developed for research purposes, and it is not my intention to infringe on any copyright. |