Reconstruct audio from mel-spectrogram with 10 ms frame shift

To use Vocos only in inference mode, install it using:

pip install vocos

Load the model and run inference:

import torch

from vocos import Vocos

vocos = Vocos.from_pretrained("meaningteam/vocos-mel-10ms-24khz")

audio = torch.randn(1, 24000)  
mel = vocos.feature_extractor(audio) 
prediction = vocos.decode(mel)

Model details

This model was trained on the DNS Challenge dataset for 1M steps. Also, it has 10 ms frame shift compared to charactr/vocos-mel-24khz.

License

The code in this repository is released under the MIT license.

Downloads last month: 210

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support