Whisper Large v3 Turbo - French V6 (Anti-Overfitting)
Fine-tuned version of openai/whisper-large-v3-turbo for French ASR using aggressive anti-overfitting strategies.
π― Performance Overview
Average WER: 7.84% β 6.83% (-1.01pp, -12.9% improvement)
Evaluated with official Open ASR Leaderboard normalizer on 5 French datasets.
π Detailed Benchmark Results
| Dataset | Base Model | Fine-tuned V6 | Improvement | Status |
|---|---|---|---|---|
| VoxPopuli | 12.11% | 8.88% | -3.23pp (-27%) | π Excellent |
| CommonVoice | 7.57% | 5.12% | -2.45pp (-32%) | π Excellent |
| TEDx | 8.52% | 8.58% | -0.06pp (-0.7%) | β Neutral |
| MLS | 5.61% | 5.58% | +0.03pp (+0.5%) | β Neutral |
| FLEURS | 5.41% | 6.01% | -0.60pp (-11%) | β οΈ Regression |
Key Findings
β
Strong improvements on VoxPopuli (-27%) and CommonVoice (-32%)
β οΈ FLEURS regression - Model optimized for European French accents, not African accents
β TEDx and MLS neutral (~0% change)
π 3,176 samples evaluated across 5 datasets
π‘οΈ Anti-Overfitting Strategy V6
This version addresses overfitting issues observed in V5 through multiple strategies:
Reduced Model Capacity
- LoRA Rank: 16 (vs 64 in V5, -75%)
- Target Modules: 6 (q_proj, k_proj, v_proj, out_proj, fc1, fc2)
- Trainable params: ~14M (vs 34M in V5, -59%)
Strong Regularization
- Dropout: 0.2 (vs 0.05 in V5, +300%)
- Weight Decay: 0.05 (vs 0.01 in V5, +400%)
- Label Smoothing: 0.15 (vs 0.05 in V5, +200%)
Conservative Training
- Learning Rate: 3e-5 (vs 1e-4 in V5, -70%)
- Epochs: 2 (vs 4 in V5, -50%)
- Training Samples: 97k (vs 122k in V5, -19%)
Strict Monitoring
- Eval frequency: every 200 steps
- Early stopping patience: 2 (vs 3 in V5)
π Training Details
- Base model: openai/whisper-large-v3-turbo
- Method: LoRA (r=16, alpha=32)
- Training samples: 97,193
- Validation samples: 2,000
- Epochs: 2
- Batch size: 64 (16 per device Γ 4 grad accum)
- Learning rate: 3e-05
- Weight decay: 0.05
- Label smoothing: 0.15
- Training time: 4.0h on A100 80GB
- Normalizer: Open ASR Leaderboard official
πΎ Datasets (Reduced for Anti-Overfitting)
Carefully selected subset focusing on generalization:
- Common Voice 23.0: 18k samples (-50%) - Reduced to prevent CV overfitting
- Multilingual LibriSpeech: 30k samples - Best generalizer
- VoxPopuli: 30k samples - Matches test distribution
- TEDx French: 12k samples (-20%) - Quality over quantity
- African Accented French: 6k samples (-25%) - Diversity focus
- MediaSpeech: 3.2k samples - Full retention (small dataset)
Total: 97k samples (vs 122k in V5, -19%)
π Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration from peft import PeftModel import torch
Load model processor = WhisperProcessor.from_pretrained("Mathos34400/whisper-large-v3-turbo-french-v6") base_model = WhisperForConditionalGeneration.from_pretrained( "openai/whisper-large-v3-turbo", torch_dtype=torch.float16, device_map="auto" ) model = PeftModel.from_pretrained(base_model, "Mathos34400/whisper-large-v3-turbo-french-v6")
Inference audio = ... # 16kHz numpy array inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to("cuda") generated_ids = model.generate(inputs["input_features"]) transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
π¬ V5 β V6 Changes
Problems in V5
- Overfitting on FLEURS (6.25% WER)
- Overfitting on VoxPopuli (13.24% WER)
- Too much capacity (34M trainable params)
- Average WER: 8.33%
V6 Solutions & Results
- Capacity reduction: Rank 64 β 16 (-75%)
- Regularization: Dropout 0.05 β 0.2, Weight decay 0.01 β 0.05
- Less data: 122k β 97k samples (-19%)
- Shorter training: 4 β 2 epochs
- Frequent eval: Every 200 steps
Result: Average WER improved from 8.33% β 6.83% (-1.50pp, -18%)
π― Use Cases
β Recommended For:
- Political speeches & debates (VoxPopuli-like)
- General conversational French (CommonVoice)
- Presentations & conferences (TEDx)
- European French accents
- Production deployments requiring speed & accuracy
β οΈ Not Optimal For:
- African French accents (FLEURS)
- Ultra-low WER requirements on audiobook-quality audio
π Comparison with V5
| Metric | V5 | V6 | Change |
|---|---|---|---|
| VoxPopuli | 13.24% | 8.88% | β -4.36pp |
| CommonVoice | N/A | 5.12% | β New |
| FLEURS | 6.25% | 6.01% | β -0.24pp |
| MLS | 5.50% | 5.58% | β +0.08pp |
| Average | 8.33% | 6.83% | β -1.50pp |
π Citation
@misc{whisper-french-v6, author = {Mathis Lacombe}, title = {Whisper Large v3 Turbo French V6 - Anti-Overfitting}, year = {2025}, publisher = {Hugging Face}, howpublished = {https://huggingface.co/Mathos34400/whisper-large-v3-turbo-french-v6} }
π License
Apache 2.0
Model Card Authors: Mathis Lacombe
Evaluation Date: December 2025
Evaluation Method: Official Open ASR Leaderboard normalizer
Model tree for Mathos34400/whisper-large-v3-turbo-french-v6
Datasets used to train Mathos34400/whisper-large-v3-turbo-french-v6
Evaluation results
- WER on VoxPopuli Frenchtest set self-reported8.880
- WER on Common Voice 17 Frenchtest set self-reported5.120
- WER on TEDx Frenchtest set self-reported8.580
- WER on Multilingual LibriSpeech Frenchtest set self-reported5.580
- WER on FLEURS Frenchtest set self-reported6.010
