|
|
--- |
|
|
base_model: |
|
|
- aoi-ot/VibeVoice-Large |
|
|
tags: |
|
|
- text-to-speech |
|
|
- tts |
|
|
- lora |
|
|
- sft |
|
|
- full-finetune |
|
|
- vibevice |
|
|
language: |
|
|
- hu |
|
|
--- |
|
|
# VibeVoice_7B_Hun_v2 |
|
|
This is my newest finetuned VibeVoice 7B (Large) model tailored to Hungarian language. |
|
|
I made this by training LoRA for the LLM module, did a full-finetune on the Diffusion head modules, then merged each of them to the base model. |
|
|
|
|
|
To finetune the model I used the [following code](https://github.com/voicepowered-ai/VibeVoice-finetuning). |
|
|
|
|
|
Thank you for [JPGallegoar](https://github.com/jpgallegoar-vpai) for that amazing VibeVoice trainer! |
|
|
|
|
|
## Inference |
|
|
For inference, you can use |
|
|
- [this Comfyui node](https://github.com/Enemyx-net/VibeVoice-ComfyUI) |
|
|
- Demo codes on [VibeVoice Community's repository](https://github.com/vibevoice-community/VibeVoice) |
|
|
|
|
|
## Examples |
|
|
These examples were made with 4bit inference. One can get even better results without quantization. |
|
|
|
|
|
**Voice without LoRA** |
|
|
<div style="display: flex; gap: 20px;"> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_nolora-1.wav"></audio> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s98765_nolora-1.wav"></audio> |
|
|
</div> |
|
|
|
|
|
|
|
|
**Voice WITH LoRA** |
|
|
<div style="display: flex; gap: 20px;"> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_hu-lora_srand3.wav"></audio> |
|
|
<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_hu-lora-1.wav"></audio> |
|
|
</div> |
|
|
|
|
|
**Important Notes:** This model is created as part of a fan project for research purposes only and is not intended for commercial use. |
|
|
The dataset I used might contain material, which are protected by copyright. Users utilize the model at their own risk. |
|
|
Users are obligated to comply with copyright laws and applicable regulations. |
|
|
The model has been developed for research purposes, and it is not my intention to infringe on any copyright. |