This model is a fine-tuned version of openai/whisper-small on the UA-Speech dataset.
Model description
This model fine-tunes Whisper-small for English transcription on dysarthric speech. Training used LoRA (PEFT) on attention and feed-forward modules, and the adapter was merged into the base model weights for deployment (no PEFT required at inference time). This model is used to show generalization via constrained capacity.
Intended uses & limitations
This model is intended for automatic speech recognition (ASR) on English speech, with an emphasis on robustness to atypical/dysarthric speech patterns resembling UA-Speech-style data. Performance may degrade on out-of-domain audio, heavy noise, non-English speech, or audio sampled far from 16 kHz. For best results, provide mono 16 kHz audio.
Audio preprocessing
- Audio loaded with
soundfile.read(file_path) - If stereo/multi-channel, converted to mono by averaging channels
- Features extracted with
WhisperProcessor.feature_extractor(..., sampling_rate=16000)
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
learning_rate: 3e-4
train_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 8
mixed_precision_training: Native AMP
LoRA / PEFT configuration
- task_type: SEQ_2_SEQ_LM
- r: 64
- lora_alpha: 128
- lora_dropout: 0.1
- target_modules: ["q_proj", "v_proj", "fc1", "fc2"]
- modules_to_save: None
Model config modifications
- model.config.forced_decoder_ids = None
- model.config.suppress_tokens = []
Training results
WER/CER were computed offline after training.
Framework versions
Transformers 4.56.2
Pytorch 2.8.0+cu128
Datasets 4.4.1
Tokenizers 0.22.1
- Downloads last month
- 10
Model tree for charleslwang/uaspeech-whisper-small-constrained
Base model
openai/whisper-small