pengyizhou
/

whisper-fleurs-ceb_ph-small-tagalog-lid

+---
+license: cc-by-nc-3.0
+datasets:
+- google/fleurs
+metrics:
+- wer
+base_model:
+- openai/whisper-large-v3
+pipeline_tag: automatic-speech-recognition
+---
+# Whisper Fine-tuning for Cebuano Language
+This project provides a configurable way to fine-tune OpenAI's Whisper model specifically on the Cebuano language using the Google FLEURS dataset (ceb_ph).
+## Features
+- **Flexible Configuration**: All parameters are configurable through YAML files
+- **Multi-GPU Support**: Automatic detection and support for multiple GPUs
+- **Dynamic Language Selection**: Train on any subset of supported languages
+- **On-the-fly Processing**: Efficient memory usage with dynamic audio preprocessing
+- **Comprehensive Evaluation**: Automatic evaluation on test sets
+## Configuration
+All parameters are configurable through the `config.yaml` file. This configuration is specifically set up for Cebuano language training using the Google FLEURS dataset.
+### Model Configuration
+- Model checkpoint (default: `openai/whisper-large-v3`)
+- Maximum target length for sequences
+### Dataset Configuration
+- Uses Google FLEURS Cebuano (ceb_ph) dataset
+- Dataset sources and splits
+- Language-specific settings
+- Training subset ratio (25% of data for faster training)
+### Training Configuration
+- Learning rate, batch sizes, training steps
+- Multi-GPU vs single GPU settings
+- Evaluation and logging parameters
+### Environment Configuration
+- CPU core limits
+- Environment variables for optimization
+### Pushing to Hub
+- I have set the configuration to not push to the Hugging Face Hub by default. You can enable this by setting `push_to_hub: true` in your config file.
+## Usage
+### Basic Usage
+```bash
+python finetune.py --config config.yaml
+```
+### Custom Configuration
+```bash
+python finetune.py --config my_custom_config.yaml
+```
+### Multi-GPU Training
+Since we only have very few training data (around 2.5 hours), multi-GPU training is not recommended.
+## Configuration File Structure
+The `config.yaml` file is organized into the following sections:
+1. **model**: Model checkpoint and sequence length settings
+2. **output**: Output directory configuration
+3. **environment**: Environment variables and CPU settings
+4. **audio**: Audio processing settings (sampling rate)
+5. **languages**: Cebuano language configuration
+6. **datasets**: Google FLEURS Cebuano dataset configuration
+7. **training**: All training hyperparameters
+8. **data_processing**: Data processing settings
+## Customizing Your Training
+### Adjusting Training Parameters
+Modify the `training` section in `config.yaml`:
+- Change learning rate, batch sizes, or training steps
+- Adjust evaluation frequency
+- Configure multi-GPU settings
+### Environment Optimization
+Adjust the `environment` section to optimize for your system:
+- Set CPU core limits
+- Configure memory usage settings
+## Configuration
+The provided `config.yaml` is specifically configured for Cebuano language training using the Google FLEURS dataset.
+## Training Commands
+### Basic Training
+```bash
+python finetune.py
+```
+### Single GPU Training
+```bash
+python finetune.py
+```
+## Inference Guide
+After training your model, you can use the provided `inference.py` script for speech recognition:
+```bash
+python inference.py
+```
+The inference script includes:
+- Model loading from the trained checkpoint
+- Audio preprocessing pipeline
+- Text generation with proper formatting
+- Support for Cebuano language transcription
+### Using the Trained Model
+The inference script automatically handles:
+- Loading the fine-tuned model weights
+- Audio preprocessing with proper sampling rate
+- Generating transcriptions for Cebuano speech
+- Output formatting for evaluation metrics
+## Dependencies
+Install required packages:
+```bash
+pip install -r requirements.txt
+```
+Key dependencies:
+- PyYAML (for configuration loading)
+- torch, transformers, datasets
+- librosa (for audio processing)
+- evaluate (for metrics)
+## Evaluation Results
+| Language    | Metric | Error Rate | Zero Shot |
+|-------------|:------:|-----------:|-----------:|
+| Cebuano    |  WER   |     16.10%  | 47.33%     |
+**Note**: If you encounter issues running finetune.py, you can use the `finetune-backup.py` file which contains the original hardcoded configuration.