Bhili ASR
Automatic Speech Recognition model for Bhili (ΰ€ΰ₯ΰ€²ΰ₯), an Indo-Aryan language spoken by the Bhil people in western India.
Model Description
| Property | Details |
|---|---|
| Architecture | Hybrid RNN-T/CTC Conformer |
| Framework | NVIDIA NeMo |
| Parameters | ~120M |
| Audio | 16kHz, mono |
| Language | Bhili (ΰ€ΰ₯ΰ€²ΰ₯) |
Training Data
This model was fine-tuned on ~100 hours of Bhili speech data:
- Read speech: Carefully recorded sentences read by native speakers
- Spontaneous speech: Natural conversational recordings
All data was manually collected with human effort to ensure quality and linguistic accuracy.
Quick Start
1. Download Files
# Install huggingface CLI
pip install huggingface_hub
# Download all files
huggingface-cli download <YOUR_USERNAME>/bhili-asr --local-dir bhili-asr
cd bhili-asr
2. Extract Tokenizers
tar -xzvf tokenizers.tar.gz
3. Update Model Paths
Update the tokenizer paths to match your local directory:
python update_paths.py --root_dir /full/path/to/bhili-asr/tokenizers/tokenizers_v3
This creates bhili_asr_finetune_v1_updated.nemo with correct paths.
4. Verify Setup
Your directory should look like:
bhili-asr/
βββ bhili_asr_finetune_v1.nemo
βββ bhili_asr_finetune_v1_updated.nemo (created after step 3)
βββ tokenizers/
β βββ tokenizers_v3/
β βββ as_256/
β βββ bn_256/
β βββ hi_256/
β βββ mr_256/
β βββ ...
βββ update_paths.py
Inference
For detailed inference instructions, Gradio web app, and CLI tools, visit the GitHub repository:
Basic Usage
from nemo.collections.asr.models import EncDecHybridRNNTCTCBPEModel
model = EncDecHybridRNNTCTCBPEModel.restore_from("bhili_asr_finetune_v1_updated.nemo")
model.eval()
result = model.transcribe(["audio.wav"], language_id="mr")
print(result[0][0])
Requirements
- Python 3.10
- NVIDIA NeMo (custom version with multilingual tokenizer support)
- PyTorch 2.0+
- CUDA 11.8+ (for GPU inference)
See the GitHub repository for complete setup instructions.
Limitations
- Optimized for clear audio recordings
- Performance may vary with heavy background noise
- Trained primarily on adult speakers
- Downloads last month
- 5