What is this ?
This is a custom audio codec.
The Encoder was borrowed from
kyutai/mimiwhile the decoder is trained from scratch using a different architecrue, higher sampling rate (44.1khz) and other modifications.
it should sound much better in most use cases.Backward compatible with any TTS that's trained on Mimi codes.
It was trained on tens of thousands of multilingual data (English, Japanese, Persian, Russian, Arabic etc.)
Inference
import librosa
import torchaudio
from IPython.display import Audio as Sawt
from audio_processing.kanadec_audio_tokenizer import load_avadec_audio_tokenizer, encode_batch
import torch
dac_model = load_avadec_audio_tokenizer("Respair/Avadec_12hz_44khz", device='cuda')
device = 'cuda'
wav, sr = librosa.load("path_to/audio.mp3", sr=24000)
tensor = torch.from_numpy(wav).unsqueeze(0).to(device)
with torch.no_grad():
encoded = encode_batch(dac_model, tensor, orig_sr=24000, return_quantized=False)
recon = dac_model.decode(encoded.audio_codes.to(device))
Sawt(recon.squeeze(), rate=44100)
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Respair/Avadec_12hz_44khz
Base model
kyutai/mimi