How can we access the acoustic encoder and semantics encoder?

#20

by hebangwen - opened 11 days ago

11 days ago

Acoustic encoder and semantics encoder is available in VibeVoice-1.5B. However, these two encoders are missing in VibeVoice-Realtime-0.5B. The predefined voice is encoded as kv-cache. Can we clone voice in zero-shot if we have these two models?

sailorjs0804

5 days ago

same question, we try to reproduce acoustic encoder, but only perform well in first 6 seconds

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment