Whisper Large v3 Turbo - French V6 (Anti-Overfitting)

Fine-tuned version of openai/whisper-large-v3-turbo for French ASR using aggressive anti-overfitting strategies.

🎯 Performance Overview

Average WER: 7.84% β†’ 6.83% (-1.01pp, -12.9% improvement)

Evaluated with official Open ASR Leaderboard normalizer on 5 French datasets.

πŸ“Š Detailed Benchmark Results

Dataset Base Model Fine-tuned V6 Improvement Status
VoxPopuli 12.11% 8.88% -3.23pp (-27%) πŸš€ Excellent
CommonVoice 7.57% 5.12% -2.45pp (-32%) πŸš€ Excellent
TEDx 8.52% 8.58% -0.06pp (-0.7%) β‰ˆ Neutral
MLS 5.61% 5.58% +0.03pp (+0.5%) β‰ˆ Neutral
FLEURS 5.41% 6.01% -0.60pp (-11%) ⚠️ Regression

image

Key Findings

βœ… Strong improvements on VoxPopuli (-27%) and CommonVoice (-32%)
⚠️ FLEURS regression - Model optimized for European French accents, not African accents
β‰ˆ TEDx and MLS neutral (~0% change)
πŸ“Š 3,176 samples evaluated across 5 datasets

πŸ›‘οΈ Anti-Overfitting Strategy V6

This version addresses overfitting issues observed in V5 through multiple strategies:

Reduced Model Capacity

  • LoRA Rank: 16 (vs 64 in V5, -75%)
  • Target Modules: 6 (q_proj, k_proj, v_proj, out_proj, fc1, fc2)
  • Trainable params: ~14M (vs 34M in V5, -59%)

Strong Regularization

  • Dropout: 0.2 (vs 0.05 in V5, +300%)
  • Weight Decay: 0.05 (vs 0.01 in V5, +400%)
  • Label Smoothing: 0.15 (vs 0.05 in V5, +200%)

Conservative Training

  • Learning Rate: 3e-5 (vs 1e-4 in V5, -70%)
  • Epochs: 2 (vs 4 in V5, -50%)
  • Training Samples: 97k (vs 122k in V5, -19%)

Strict Monitoring

  • Eval frequency: every 200 steps
  • Early stopping patience: 2 (vs 3 in V5)

πŸ“Š Training Details

  • Base model: openai/whisper-large-v3-turbo
  • Method: LoRA (r=16, alpha=32)
  • Training samples: 97,193
  • Validation samples: 2,000
  • Epochs: 2
  • Batch size: 64 (16 per device Γ— 4 grad accum)
  • Learning rate: 3e-05
  • Weight decay: 0.05
  • Label smoothing: 0.15
  • Training time: 4.0h on A100 80GB
  • Normalizer: Open ASR Leaderboard official

πŸ’Ύ Datasets (Reduced for Anti-Overfitting)

Carefully selected subset focusing on generalization:

  • Common Voice 23.0: 18k samples (-50%) - Reduced to prevent CV overfitting
  • Multilingual LibriSpeech: 30k samples - Best generalizer
  • VoxPopuli: 30k samples - Matches test distribution
  • TEDx French: 12k samples (-20%) - Quality over quantity
  • African Accented French: 6k samples (-25%) - Diversity focus
  • MediaSpeech: 3.2k samples - Full retention (small dataset)

Total: 97k samples (vs 122k in V5, -19%)

πŸš€ Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration from peft import PeftModel import torch

Load model processor = WhisperProcessor.from_pretrained("Mathos34400/whisper-large-v3-turbo-french-v6") base_model = WhisperForConditionalGeneration.from_pretrained( "openai/whisper-large-v3-turbo", torch_dtype=torch.float16, device_map="auto" ) model = PeftModel.from_pretrained(base_model, "Mathos34400/whisper-large-v3-turbo-french-v6")

Inference audio = ... # 16kHz numpy array inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to("cuda") generated_ids = model.generate(inputs["input_features"]) transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)

πŸ”¬ V5 β†’ V6 Changes

Problems in V5

  • Overfitting on FLEURS (6.25% WER)
  • Overfitting on VoxPopuli (13.24% WER)
  • Too much capacity (34M trainable params)
  • Average WER: 8.33%

V6 Solutions & Results

  1. Capacity reduction: Rank 64 β†’ 16 (-75%)
  2. Regularization: Dropout 0.05 β†’ 0.2, Weight decay 0.01 β†’ 0.05
  3. Less data: 122k β†’ 97k samples (-19%)
  4. Shorter training: 4 β†’ 2 epochs
  5. Frequent eval: Every 200 steps

Result: Average WER improved from 8.33% β†’ 6.83% (-1.50pp, -18%)

🎯 Use Cases

βœ… Recommended For:

  • Political speeches & debates (VoxPopuli-like)
  • General conversational French (CommonVoice)
  • Presentations & conferences (TEDx)
  • European French accents
  • Production deployments requiring speed & accuracy

⚠️ Not Optimal For:

  • African French accents (FLEURS)
  • Ultra-low WER requirements on audiobook-quality audio

πŸ“ˆ Comparison with V5

Metric V5 V6 Change
VoxPopuli 13.24% 8.88% βœ… -4.36pp
CommonVoice N/A 5.12% βœ… New
FLEURS 6.25% 6.01% βœ… -0.24pp
MLS 5.50% 5.58% β‰ˆ +0.08pp
Average 8.33% 6.83% βœ… -1.50pp

πŸ“ Citation

@misc{whisper-french-v6, author = {Mathis Lacombe}, title = {Whisper Large v3 Turbo French V6 - Anti-Overfitting}, year = {2025}, publisher = {Hugging Face}, howpublished = {https://huggingface.co/Mathos34400/whisper-large-v3-turbo-french-v6} }

πŸ“„ License

Apache 2.0


Model Card Authors: Mathis Lacombe
Evaluation Date: December 2025
Evaluation Method: Official Open ASR Leaderboard normalizer

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Mathos34400/whisper-large-v3-turbo-french-v6

Adapter
(78)
this model

Datasets used to train Mathos34400/whisper-large-v3-turbo-french-v6

Evaluation results