Hinglish Parakeet FastConformer RNNT (110M)
Model Details
- Architecture: FastConformer RNNT
- Parameters: ~110M
- Streaming: โ Yes
- Language: English (Streaming-focused)
- Framework: NVIDIA NeMo
- Training: Stage-1 RNNT fine-tuning
results
0.05 WER on LibriSpeech dataset 0.06 WER on CommonVoice Dataset
training
it was trained on 1100 hours of data for around 20 epochs
training notebook : https://www.kaggle.com/code/nijajohww/script-stage-1-53ebaf-19bb76-819eda
Training Infrastructure: The model was trained entirely on a Kaggle-provided NVIDIA Tesla P100 GPU. Due to compute constraints, the full training process took approximately 200 hours to complete, covering all epochs of Stage-1 RNNT fine-tuning.
How to Use this Model
The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for streaming or for fine-tuning on another dataset. You will need to install NVIDIA NeMo. We recommend you install it after you've installed latest Pytorch version.
pip install nemo_toolkit['all']
Transcribing using Python
Cache-aware models are designed in a way that the model's predictions are the same in both offline and streaming mode.
So you may use the regular transcribe function to get the transcriptions. First, let's get a sample:
wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
Then simply do:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(model_name="nvidia/stt_en_fastconformer_hybrid_large_streaming_multi")
# Optional: change the default latency. Default latency is 1040ms. Supported latencies: {0: 0ms, 1: 80ms, 16: 480ms, 33: 1040ms}.
# Note: These are the worst latency and average latency would be half of these numbers.
asr_model.encoder.set_default_att_context_size([70,13])
#Optional: change the default decoder. Default decoder is Transducer (RNNT). Supported decoders: {ctc, rnnt}.
asr_model.change_decoding_strategy(decoder_type='rnnt')
output = asr_model.transcribe(['2086-149220-0033.wav'])
print(output[0].text)
- Downloads last month
- 11