Update README.md

fbd27fc verified 6 days ago

2 kB

metadata

language:
  - en
tags:
  - automatic-speech-recognition
  - speech-to-text
  - whisper
  - peft
  - lora
  - seq2seq
base_model: openai/whisper-small
pipeline_tag: automatic-speech-recognition

This model is a fine-tuned version of openai/whisper-small on the UA-Speech dataset.

Model description

This model fine-tunes Whisper-small for English transcription on dysarthric speech. Training used LoRA (PEFT) on attention and feed-forward modules, and the adapter was merged into the base model weights for deployment (no PEFT required at inference time). This model is used to show generalization via constrained capacity.

Intended uses & limitations

This model is intended for automatic speech recognition (ASR) on English speech, with an emphasis on robustness to atypical/dysarthric speech patterns resembling UA-Speech-style data. Performance may degrade on out-of-domain audio, heavy noise, non-English speech, or audio sampled far from 16 kHz. For best results, provide mono 16 kHz audio.

Audio preprocessing

Audio loaded with soundfile.read(file_path)
If stereo/multi-channel, converted to mono by averaging channels
Features extracted with WhisperProcessor.feature_extractor(..., sampling_rate=16000)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-4
train_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 8
mixed_precision_training: Native AMP

LoRA / PEFT configuration

task_type: SEQ_2_SEQ_LM
r: 64
lora_alpha: 128
lora_dropout: 0.1
target_modules: ["q_proj", "v_proj", "fc1", "fc2"]
modules_to_save: None

Model config modifications

model.config.forced_decoder_ids = None
model.config.suppress_tokens = []

Training results

WER/CER were computed offline after training.

Framework versions

Transformers 4.56.2
Pytorch 2.8.0+cu128
Datasets 4.4.1
Tokenizers 0.22.1