Cseti
/

VibeVoice_7B_hun_v2

Model card Files Files and versions

VibeVoice_7B_hun_v2 / README.md

Cseti's picture

Update README.md

fe0eca7 verified about 2 months ago

|

2.12 kB

	---
	base_model:
	- aoi-ot/VibeVoice-Large
	tags:
	- text-to-speech
	- tts
	- lora
	- sft
	- full-finetune
	- vibevice
	language:
	- hu
	---
	# VibeVoice_7B_Hun_v2
	This is my newest finetuned VibeVoice 7B (Large) model tailored to Hungarian language.
	I made this by training LoRA for the LLM module, did a full-finetune on the Diffusion head modules, then merged each of them to the base model.

	To finetune the model I used the [following code](https://github.com/voicepowered-ai/VibeVoice-finetuning).

	Thank you for [JPGallegoar](https://github.com/jpgallegoar-vpai) for that amazing VibeVoice trainer!

	## Inference
	For inference, you can use
	- [this Comfyui node](https://github.com/Enemyx-net/VibeVoice-ComfyUI)
	- Demo codes on [VibeVoice Community's repository](https://github.com/vibevoice-community/VibeVoice)

	## Examples
	These examples were made with 4bit inference. One can get even better results without quantization.

	Voice without LoRA
	<div style="display: flex; gap: 20px;">
	<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_nolora-1.wav"></audio>
	<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s98765_nolora-1.wav"></audio>
	</div>


	Voice WITH LoRA
	<div style="display: flex; gap: 20px;">
	<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_hu-lora_srand3.wav"></audio>
	<audio controls src="https://huggingface.co/Cseti/VibeVoice_7B_Diffusion-head-LoRA_Hungarian-CV17/resolve/main/assets/synth_s42_hu-lora-1.wav"></audio>
	</div>

	Important Notes: This model is created as part of a fan project for research purposes only and is not intended for commercial use.
	The dataset I used might contain material, which are protected by copyright. Users utilize the model at their own risk.
	Users are obligated to comply with copyright laws and applicable regulations.
	The model has been developed for research purposes, and it is not my intention to infringe on any copyright.