File size: 4,468 Bytes

---
license: cc-by-nc-3.0
datasets:
- google/fleurs
metrics:
- wer
base_model:
- openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition
---
# Whisper Fine-tuning for Cebuano Language

This project provides a configurable way to fine-tune OpenAI's Whisper model specifically on the Cebuano language using the Google FLEURS dataset (ceb_ph).

## Features

- **Flexible Configuration**: All parameters are configurable through YAML files
- **Multi-GPU Support**: Automatic detection and support for multiple GPUs
- **Dynamic Language Selection**: Train on any subset of supported languages
- **On-the-fly Processing**: Efficient memory usage with dynamic audio preprocessing
- **Comprehensive Evaluation**: Automatic evaluation on test sets

## Configuration

All parameters are configurable through the `config.yaml` file. This configuration is specifically set up for Cebuano language training using the Google FLEURS dataset.

### Model Configuration
- Model checkpoint (default: `openai/whisper-large-v3`)
- Maximum target length for sequences

### Dataset Configuration
- Uses Google FLEURS Cebuano (ceb_ph) dataset
- Dataset sources and splits
- Language-specific settings
- Training subset ratio (25% of data for faster training)

### Training Configuration
- Learning rate, batch sizes, training steps
- Multi-GPU vs single GPU settings
- Evaluation and logging parameters

### Environment Configuration
- CPU core limits
- Environment variables for optimization

### Pushing to Hub
- I have set the configuration to not push to the Hugging Face Hub by default. You can enable this by setting `push_to_hub: true` in your config file.

## Usage

### Basic Usage
```bash
python finetune.py --config config.yaml
```

### Custom Configuration
```bash
python finetune.py --config my_custom_config.yaml
```

### Multi-GPU Training
Since we only have very few training data (around 2.5 hours), multi-GPU training is not recommended.

## Configuration File Structure

The `config.yaml` file is organized into the following sections:

1. **model**: Model checkpoint and sequence length settings
2. **output**: Output directory configuration
3. **environment**: Environment variables and CPU settings
4. **audio**: Audio processing settings (sampling rate)
5. **languages**: Cebuano language configuration
6. **datasets**: Google FLEURS Cebuano dataset configuration
7. **training**: All training hyperparameters
8. **data_processing**: Data processing settings

## Customizing Your Training

### Adjusting Training Parameters
Modify the `training` section in `config.yaml`:
- Change learning rate, batch sizes, or training steps
- Adjust evaluation frequency
- Configure multi-GPU settings

### Environment Optimization
Adjust the `environment` section to optimize for your system:
- Set CPU core limits
- Configure memory usage settings

## Configuration

The provided `config.yaml` is specifically configured for Cebuano language training using the Google FLEURS dataset.

## Training Commands

### Basic Training
```bash
python finetune.py
```

### Single GPU Training
```bash
python finetune.py
```

## Inference Guide

After training your model, you can use the provided `inference.py` script for speech recognition:

```bash
python inference.py
```

The inference script includes:
- Model loading from the trained checkpoint
- Audio preprocessing pipeline
- Text generation with proper formatting
- Support for Cebuano language transcription

### Using the Trained Model

The inference script automatically handles:
- Loading the fine-tuned model weights
- Audio preprocessing with proper sampling rate
- Generating transcriptions for Cebuano speech
- Output formatting for evaluation metrics

## Dependencies

Install required packages:
```bash
pip install -r requirements.txt
```

Key dependencies:
- PyYAML (for configuration loading)
- torch, transformers, datasets
- librosa (for audio processing)
- evaluate (for metrics)

## Zero-shot Results 
| LID    | Metric | Error Rate |
|-------------|:------:|-----------:|
| Khmer       |  WER   |     355% |
| Tagalog     |  WER   |   40.14% |
| Auto        |  WER   |   40.13% |

## Evaluation Results
| Language    | Metric | Error Rate |
|-------------|:------:|-----------:|
| Tagalog       |  CER   |   13.42% |
| Auto        |  CER   |   13.40% |


**Note**: If you encounter issues running finetune.py, you can use the `finetune-backup.py` file which contains the original hardcoded configuration.