Bhili ASR

Automatic Speech Recognition model for Bhili (ΰ€­ΰ₯€ΰ€²ΰ₯€), an Indo-Aryan language spoken by the Bhil people in western India.

Model Description

Property Details
Architecture Hybrid RNN-T/CTC Conformer
Framework NVIDIA NeMo
Parameters ~120M
Audio 16kHz, mono
Language Bhili (ΰ€­ΰ₯€ΰ€²ΰ₯€)

Training Data

This model was fine-tuned on ~100 hours of Bhili speech data:

  • Read speech: Carefully recorded sentences read by native speakers
  • Spontaneous speech: Natural conversational recordings

All data was manually collected with human effort to ensure quality and linguistic accuracy.

Quick Start

1. Download Files

# Install huggingface CLI
pip install huggingface_hub

# Download all files
huggingface-cli download <YOUR_USERNAME>/bhili-asr --local-dir bhili-asr
cd bhili-asr

2. Extract Tokenizers

tar -xzvf tokenizers.tar.gz

3. Update Model Paths

Update the tokenizer paths to match your local directory:

python update_paths.py --root_dir /full/path/to/bhili-asr/tokenizers/tokenizers_v3

This creates bhili_asr_finetune_v1_updated.nemo with correct paths.

4. Verify Setup

Your directory should look like:

bhili-asr/
β”œβ”€β”€ bhili_asr_finetune_v1.nemo
β”œβ”€β”€ bhili_asr_finetune_v1_updated.nemo  (created after step 3)
β”œβ”€β”€ tokenizers/
β”‚   └── tokenizers_v3/
β”‚       β”œβ”€β”€ as_256/
β”‚       β”œβ”€β”€ bn_256/
β”‚       β”œβ”€β”€ hi_256/
β”‚       β”œβ”€β”€ mr_256/
β”‚       └── ...
└── update_paths.py

Inference

For detailed inference instructions, Gradio web app, and CLI tools, visit the GitHub repository:

GitHub: bhili-asr

Basic Usage

from nemo.collections.asr.models import EncDecHybridRNNTCTCBPEModel

model = EncDecHybridRNNTCTCBPEModel.restore_from("bhili_asr_finetune_v1_updated.nemo")
model.eval()

result = model.transcribe(["audio.wav"], language_id="mr")
print(result[0][0])

Requirements

  • Python 3.10
  • NVIDIA NeMo (custom version with multilingual tokenizer support)
  • PyTorch 2.0+
  • CUDA 11.8+ (for GPU inference)

See the GitHub repository for complete setup instructions.

Limitations

  • Optimized for clear audio recordings
  • Performance may vary with heavy background noise
  • Trained primarily on adult speakers
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support