What is this ?

This is a custom audio codec.

The Encoder was borrowed from kyutai/mimi while the decoder is trained from scratch using a different architecrue, higher sampling rate (44.1khz) and other modifications.
it should sound much better in most use cases.
Backward compatible with any TTS that's trained on Mimi codes.
It was trained on tens of thousands of multilingual data (English, Japanese, Persian, Russian, Arabic etc.)

Inference

import librosa
import torchaudio
from IPython.display import Audio as Sawt


from audio_processing.kanadec_audio_tokenizer import load_avadec_audio_tokenizer, encode_batch
import torch


dac_model = load_avadec_audio_tokenizer("Respair/Avadec_12hz_44khz", device='cuda')


device = 'cuda'
wav, sr = librosa.load("path_to/audio.mp3", sr=24000)
tensor = torch.from_numpy(wav).unsqueeze(0).to(device)

with torch.no_grad():
    encoded = encode_batch(dac_model, tensor, orig_sr=24000, return_quantized=False)
    recon = dac_model.decode(encoded.audio_codes.to(device))
    
Sawt(recon.squeeze(), rate=44100)

Downloads last month: 11

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Respair/Avadec_12hz_44khz

Base model

kyutai/mimi

Finetuned

(2)

this model