charleslwang
/

uaspeech-whisper-small-constrained

Automatic Speech Recognition

Model card Files Files and versions

uaspeech-whisper-small-constrained / README.md

charleslwang's picture

Update README.md

fbd27fc verified 7 days ago

|

history blame contribute delete

2 kB

	---
	language:
	- en
	tags:
	- automatic-speech-recognition
	- speech-to-text
	- whisper
	- peft
	- lora
	- seq2seq
	base_model: openai/whisper-small
	pipeline_tag: automatic-speech-recognition
	---

	This model is a fine-tuned version of openai/whisper-small on the UA-Speech dataset.

	## Model description
	This model fine-tunes Whisper-small for English transcription on dysarthric speech. Training used LoRA (PEFT) on attention and feed-forward modules, and the adapter was merged into the base model weights for deployment (no PEFT required at inference time). This model is used to show generalization via constrained capacity.

	## Intended uses & limitations
	This model is intended for automatic speech recognition (ASR) on English speech, with an emphasis on robustness to atypical/dysarthric speech patterns resembling UA-Speech-style data. Performance may degrade on out-of-domain audio, heavy noise, non-English speech, or audio sampled far from 16 kHz. For best results, provide mono 16 kHz audio.

	## Audio preprocessing
	- Audio loaded with `soundfile.read(file_path)`
	- If stereo/multi-channel, converted to mono by averaging channels
	- Features extracted with `WhisperProcessor.feature_extractor(..., sampling_rate=16000)`

	## Training procedure

	### Training hyperparameters
	The following hyperparameters were used during training:

	learning_rate: 3e-4
	train_batch_size: 16
	seed: 42
	optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
	lr_scheduler_type: linear
	num_epochs: 8
	mixed_precision_training: Native AMP

	## LoRA / PEFT configuration
	- task_type: SEQ_2_SEQ_LM
	- r: 64
	- lora_alpha: 128
	- lora_dropout: 0.1
	- target_modules: ["q_proj", "v_proj", "fc1", "fc2"]
	- modules_to_save: None

	## Model config modifications
	- model.config.forced_decoder_ids = None
	- model.config.suppress_tokens = []

	## Training results
	WER/CER were computed offline after training.

	## Framework versions
	Transformers 4.56.2
	Pytorch 2.8.0+cu128
	Datasets 4.4.1
	Tokenizers 0.22.1