What is this ?

This is a custom audio codec.

  • The Encoder was borrowed from kyutai/mimi while the decoder is trained from scratch using a different architecrue, higher sampling rate (44.1khz) and other modifications.
    it should sound much better in most use cases.

  • Backward compatible with any TTS that's trained on Mimi codes.

  • It was trained on tens of thousands of multilingual data (English, Japanese, Persian, Russian, Arabic etc.)

Inference

import librosa
import torchaudio
from IPython.display import Audio as Sawt


from audio_processing.kanadec_audio_tokenizer import load_avadec_audio_tokenizer, encode_batch
import torch


dac_model = load_avadec_audio_tokenizer("Respair/Avadec_12hz_44khz", device='cuda')


device = 'cuda'
wav, sr = librosa.load("path_to/audio.mp3", sr=24000)
tensor = torch.from_numpy(wav).unsqueeze(0).to(device)

with torch.no_grad():
    encoded = encode_batch(dac_model, tensor, orig_sr=24000, return_quantized=False)
    recon = dac_model.decode(encoded.audio_codes.to(device))
    
Sawt(recon.squeeze(), rate=44100)
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Respair/Avadec_12hz_44khz

Base model

kyutai/mimi
Finetuned
(2)
this model