This model is a fine-tuned version of openai/whisper-small on the UA-Speech dataset.

Model description

This model fine-tunes Whisper-small for English transcription on dysarthric speech. Training used LoRA (PEFT) on attention and feed-forward modules, and the adapter was merged into the base model weights for deployment (no PEFT required at inference time). This model is used to show generalization via constrained capacity.

Intended uses & limitations

This model is intended for automatic speech recognition (ASR) on English speech, with an emphasis on robustness to atypical/dysarthric speech patterns resembling UA-Speech-style data. Performance may degrade on out-of-domain audio, heavy noise, non-English speech, or audio sampled far from 16 kHz. For best results, provide mono 16 kHz audio.

Audio preprocessing

  • Audio loaded with soundfile.read(file_path)
  • If stereo/multi-channel, converted to mono by averaging channels
  • Features extracted with WhisperProcessor.feature_extractor(..., sampling_rate=16000)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-4
train_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 8
mixed_precision_training: Native AMP

LoRA / PEFT configuration

  • task_type: SEQ_2_SEQ_LM
  • r: 64
  • lora_alpha: 128
  • lora_dropout: 0.1
  • target_modules: ["q_proj", "v_proj", "fc1", "fc2"]
  • modules_to_save: None

Model config modifications

  • model.config.forced_decoder_ids = None
  • model.config.suppress_tokens = []

Training results

WER/CER were computed offline after training.

Framework versions

Transformers 4.56.2
Pytorch 2.8.0+cu128
Datasets 4.4.1
Tokenizers 0.22.1

Downloads last month
10
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for charleslwang/uaspeech-whisper-small-constrained

Adapter
(161)
this model

Collection including charleslwang/uaspeech-whisper-small-constrained