You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model mimba/whisper-ngiemboon

This repository hosts a fine-tuned version of openai/whisper-medium adapted for Automatic Speech Recognition (ASR) from Ngiemboon (nnh).

🧠 Model Details

Model name: mimba/whisper-ngiemboon
Architecture: Transformer encoder–decoder (fine-tuned)
Language: Ngiemboon (Bantu language spoken in Cameroon)
Task: Automatic Speech Recognition (ASR)
Base model: openai/whisper-medium
Author: Mimba

🎯 Intended Use

Use case: Transcribe spoken Ngiemboon into text.
Audience: Linguists, researchers, developers working on low-resource ASR.
Input: 16kHz mono audio waveform in Ngiemboon.
Output: Transcribed text in Ngiemboon.
Not suitable for: Noisy environments, dialects not represented in training data.

📚 Training Data

Source: Community-collected Ngiemboon speech corpus.
Size: Approximately 24 hours of transcribed audio.
Preprocessing:
- Audio resampled to 16kHz mono.
- Normalized and tokenized using a custom vocabulary.
Split: Train / Test

📈 Evaluation

Metric: Word Error Rate (WER)
Test set: Held-out Ngiemboon recordings

Results:

Training Loss	Epoch	Step	Validation Loss	Wer
0.7846	1.0	589	0.7358	0.6419
0.5542	2.0	1178	0.5998	0.6358
0.4704	3.0	1767	0.5379	0.5331
0.4088	4.0	2356	0.5138	0.5010
0.3807	5.0	2945	0.4872	0.5061
0.3395	6.0	3534	0.4809	0.4807
0.3426	7.0	4123	0.4710	0.4997
0.3215	8.0	4712	0.4676	0.4730
0.3045	9.0	5301	0.4636	0.4844
0.2959	10.0	5890	0.4636	0.4744

Framework versions

PEFT 0.18.0
Transformers 5.0.0
Pytorch 2.10.0+cu128
Datasets 2.18.0
Tokenizers 0.22.2

🔭 Future Work

Expand training corpus with more speakers.
Improve robustness to noise and real-world conditions.
Release open-source Ngiemboon dataset for community use.
Explore multilingual fine-tuning with other Bantu languages.

⚠️ Limitations and Risks

May perform poorly on dialects or accents not seen during training.
Not robust to background noise or overlapping speech.
Limited training data may affect generalization.

💻 Usage Example

from transformers import AutoProcessor, WhisperForConditionalGeneration
import torch
import soundfile as sf

# Load model and processor (depuis ton repo ou dossier local)
processor = AutoProcessor.from_pretrained("mimba/whisper-ngiemboon")
model = WhisperForConditionalGeneration.from_pretrained("mimba/whisper-ngiemboon")

# Load audio
speech, rate = sf.read("example_ngiemboon.wav")

# Préparer les features
inputs = processor(speech, sampling_rate=rate, return_tensors="pt")

# Predict
with torch.no_grad():
    predicted_ids = model.generate(inputs["input_features"])

# Décoder la transcription
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

📬 BibTeX entry and citation info

@misc{
      title={afrilang: Small Out-of-domain resource for various africain languages}, 
      author={Mimba Ngouana Fofou},
      year={2026},
      howpublished={\\url{https://huggingface.co/mimba/whisper-ngiemboon}}
}

Contact For all questions contact @Mimba.

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for mimba/whisper-ngiemboon

Base model

openai/whisper-medium

Finetuned

(815)

this model

mimba
/

whisper-ngiemboon