How can we access the acoustic encoder and semantics encoder?
#20
by
hebangwen
- opened
Acoustic encoder and semantics encoder is available in VibeVoice-1.5B. However, these two encoders are missing in VibeVoice-Realtime-0.5B. The predefined voice is encoded as kv-cache. Can we clone voice in zero-shot if we have these two models?
same question, we try to reproduce acoustic encoder, but only perform well in first 6 seconds