Extend nvidia/parakeet-tdt-0.6b-v3 from TDT to hybrid TDT-CTC:
- Kept encoder and TDT decoder, reinitialized CTC decoder with the same vocab of 8192 tokens
- Can be used for pure CTC or hybrid CTC-RNNT finetuning
Sanity check seen below passed, getting the same transcriptions using TDT and gibberish with reinitialized CTC:
from nemo.collections.asr.models import ASRModel
nemo_model_path = "bofenghuang/parakeet-tdt-0.6b-v3-hybrid"
asr_model = ASRModel.from_pretrained(model_name=nemo_model_path)
audio_path = "example.wav"
result = asr_model.transcribe([audio_path])[0]
print(result.text)
# expect same output to nvidia/parakeet-tdt-0.6b-v3
asr_model.change_decoding_strategy(decoder_type="ctc")
result = asr_model.transcribe([audio_path])[0]
print(result.text)
# expect gibberish output
- Downloads last month
- 4