Whisper Medium Uzbek v1 by Kotibai & Rubai Team

Developed by Kotibai & Rubai Team

Uzbek Automatic Speech Recognition (ASR) model fine-tuned from Whisper Medium.

Model Description

  • Base Model: OpenAI Whisper Medium (769M parameters)
  • Language: Uzbek (uz)
  • Training Data: ~1,600 hours of Uzbek audio
  • Precision: BF16
  • Script: Latin (handles Russian loanwords in Latin script: "brat", "davay", "prosto", etc.)

Evaluation Results

Category WER
Overall 16.7%
Clean Speech ~6-11%
Noisy/Augmented ~12-24%
Dialects ~16-25%

Evaluated on 1,864 samples across 8 diverse test sets.

Usage

Using Transformers

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

processor = WhisperProcessor.from_pretrained("Kotib/uzbek_stt_v1")
model = WhisperForConditionalGeneration.from_pretrained("Kotib/uzbek_stt_v1")

audio, sr = librosa.load("audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

predicted_ids = model.generate(input_features, language="uz", task="transcribe")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Using Pipeline

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="Kotib/uzbek_stt_v1",
    chunk_length_s=30,
    device="cuda"
)

result = pipe("audio.wav", generate_kwargs={"language": "uz", "task": "transcribe"})
print(result["text"])

Training

Trained in 3 stages using curriculum learning:

Stage Hours
Foundation 725h
Robustness 394h
Domain Adaptation 474h

Intended Use

  • Uzbek speech-to-text transcription
  • Voice assistants and dictation
  • Media transcription and subtitling

Limitations

  • Performance degrades on very noisy audio
  • May struggle with heavy code-switching
  • Optimized for Uzbek only

License

Apache 2.0

Downloads last month
427
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Kotib/uzbek_stt_v1

Finetuned
(766)
this model

Evaluation results