Extend nvidia/parakeet-tdt-0.6b-v3 from TDT to hybrid TDT-CTC:

  • Kept encoder and TDT decoder, reinitialized CTC decoder with the same vocab of 8192 tokens
  • Can be used for pure CTC or hybrid CTC-RNNT finetuning

Sanity check seen below passed, getting the same transcriptions using TDT and gibberish with reinitialized CTC:

from nemo.collections.asr.models import ASRModel

nemo_model_path = "bofenghuang/parakeet-tdt-0.6b-v3-hybrid"
asr_model = ASRModel.from_pretrained(model_name=nemo_model_path)

audio_path = "example.wav"
result = asr_model.transcribe([audio_path])[0]
print(result.text)
# expect same output to nvidia/parakeet-tdt-0.6b-v3

asr_model.change_decoding_strategy(decoder_type="ctc")
result = asr_model.transcribe([audio_path])[0]
print(result.text)
# expect gibberish output
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train bofenghuang/parakeet-tdt-0.6b-v3-hybrid