The important question

by yukiarimo - opened Nov 5, 2025

Discussion

yukiarimo

Nov 5, 2025

So, how about releasing the full dataset? Or you have just illegally ripped off stolen voices from the web?

xscMpOV

Nov 6, 2025

surely this is the best way to ask for anything

mueller91

Nov 6, 2025

most companies/developers wouldn't release a training dataset, even when the model is open source. this is not unusual.

yukiarimo

Nov 7, 2025

•

edited Nov 8, 2025

most companies/developers wouldn't release a training dataset, even when the model is open source. this is not unusual.

A bunch of projects like VITS, Tacotron, etc., have released! (And usually they use LJSpeech)
If you not even say where the data is coming from, it's definitely 100% stolen and they MUST be banned from HF!

flowring-luyiourwong

Nov 10, 2025

A bunch of projects like VITS, Tacotron, etc., have released! (And usually they use LJSpeech)

If you not even say where the data is coming from, it's definitely 100% stolen and they MUST be banned from HF!

right, hf should ban 95% models include gpt, llama, gemma as well. none of them have release datasets lol

btw, maya actully notes training data in the metadata

simzhou

Nov 11, 2025

such an aggressive post...

DheemanthReddy

Maya Research org Nov 20, 2025

We’re building voice intelligence for everyone and releasing it freely. That mission stays the same.

The internet is a shared resource. We’ll use every open audio source we can find to train models that talk naturally and push the frontier forward, available to all at no cost.

yukiarimo

Nov 20, 2025

Thieves

bharathkumarK changed discussion status to closed Nov 20, 2025

DheemanthReddy

Maya Research org Nov 20, 2025

Thank you

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment