🗣️ Kiswahili Sahihi ASR — Swahili Audio Transcription
This model enables high-quality, long-form Kiswahili speech transcription from multiple audio formats (e.g., .mp3, .wav, .m4a, .aac, .ogg, .flac, .amr) using a simple, efficient pipeline.
It’s optimized for speed, accuracy, and real-world usability, even on modest hardware.
🚀 Key Features
- ✅ Supports multiple audio formats via FFmpeg + Pydub
- 🧠 Built on 🤗 Transformers
- 🪶 Automatically converts audio to
16 kHzmono - ⏳ Transcribes long recordings using smart chunking (default: 60s per chunk)
- 🖥️ Works seamlessly on both CPU and GPU
- 🌍 Focused on Kiswahili language transcription
🦊Example using the model
# ============================================
# 🪄 Full Swahili Audio Transcription Script
# ============================================
# 📦 Install
!pip install transformers
!pip install "datasets<4.0.0"
!pip install torchvision==0.21.0 torchaudio==2.6.0 jiwer evaluate
!pip install soundfile librosa accelerate>=0.26.0 tensorboard -U bitsandbytes
!apt-get -y install ffmpeg
import torch
import librosa
import numpy as np
from pydub import AudioSegment
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import os
# =============================
# 1. 🔸 Model Setup
# =============================
model_id = "keystats/kiswahili_sahihi_asr"
processor = AutoProcessor.from_pretrained(model_id)
# Use float32 to avoid half precision mismatch issues
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to("cuda", dtype=torch.float32)
# =============================
# 2. 🔸 Convert any format to WAV
# =============================
def convert_to_wav(input_path, output_path="converted.wav"):
try:
audio = AudioSegment.from_file(input_path)
audio = audio.set_frame_rate(16000).set_channels(1)
audio.export(output_path, format="wav")
return output_path
except Exception as e:
raise RuntimeError(f"❌ Could not convert file. Check if FFmpeg is installed and file is supported. Error: {e}")
# 👇 Just change this path to your audio file
audio_path = "your swahili audio "
wav_path = convert_to_wav(audio_path)
# =============================
# 3. 🔸 Load audio and chunk
# =============================
audio_input, sr = librosa.load(wav_path, sr=16000, mono=True)
chunk_length_s = 60 # seconds
chunk_size = chunk_length_s * sr
num_chunks = int(np.ceil(len(audio_input) / chunk_size))
print(f"🔹 Total length: {len(audio_input)/sr:.2f} sec | Splitting into {num_chunks} chunks...")
# =============================
# 4. 🔸 Transcribe each chunk
# =============================
full_transcription = []
for i in range(num_chunks):
start = i * chunk_size
end = min((i + 1) * chunk_size, len(audio_input))
chunk = audio_input[start:end]
inputs = processor(
chunk,
sampling_rate=16000,
return_tensors="pt",
padding=True
).to("cuda", dtype=torch.float32)
with torch.no_grad():
generated_ids = model.generate(**inputs, max_length=20000)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
full_transcription.append(text.strip())
# =============================
# 5. 🔸 Combine final transcript
# =============================
final_text = " ".join(full_transcription)
print(" 📝 Final Transcription:")
print(final_text)
🧪 Example Output
| 🎧 Input Audio | 📝 Transcription Output |
|---|---|
| mashairi_sauti.mp3 | “Karibu kwenye mfumo wetu wa Kiswahili Sahihi.” |
| mazungumzo_flac.flac | “Habari yako, karibu tena kesho kwa mahojiano mengine.” |
🛠️ Tips for Best Results
- Use clear audio without background noise.
- Long recordings are automatically split into 60-second chunks.
- Works with
.mp3,.wav,.m4a,.aac,.ogg,.flac,.amrand more. - Ensure audio is sampled at 16 kHz and mono (automatically handled).
🌟 Acknowledgements
📢 Contribute
- 🧪 Share more Swahili audio samples
- 🧑💻 Report issues or improvements
- 🌍 Help expand coverage for different accents and dialects
🧭 Citation
@model{kiswahili_sahihi_asr,
author = {Jackson Kahungu},
title = {Kiswahili Sahihi ASR — Swahili Audio Transcription},
year = {2025},
publisher = {Hugging Face}
}
✨ Final Note
“If you like the model, leave a like ❤🧡❤”
This model may not be perfect, but it provides a strong baseline for building future Swahili transcription systems.
Together, we can make Swahili voice technology accessible to everyone.✨ 🎊KISWAHILI KITUKUZWE🎉
- Downloads last month
- 415