How can we access the acoustic encoder and semantics encoder?

#20
by hebangwen - opened

Acoustic encoder and semantics encoder is available in VibeVoice-1.5B. However, these two encoders are missing in VibeVoice-Realtime-0.5B. The predefined voice is encoded as kv-cache. Can we clone voice in zero-shot if we have these two models?

same question, we try to reproduce acoustic encoder, but only perform well in first 6 seconds

Sign up or log in to comment