Reconstruct audio from mel-spectrogram with 10 ms frame shift
To use Vocos only in inference mode, install it using:
pip install vocos
Load the model and run inference:
import torch
from vocos import Vocos
vocos = Vocos.from_pretrained("meaningteam/vocos-mel-10ms-24khz")
audio = torch.randn(1, 24000)
mel = vocos.feature_extractor(audio)
prediction = vocos.decode(mel)
Model details
This model was trained on the DNS Challenge dataset for 1M steps. Also, it has 10 ms frame shift compared to charactr/vocos-mel-24khz.
License
The code in this repository is released under the MIT license.
- Downloads last month
- 210
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support