Whisper for Swiss German Dialects
This model is a fine-tuned version of openai/whisper-base on the SPC [1] and SwissDial [2] datasets.
Model description
The model a qLora (8-bit) adapter for OpenAI's Whisper model (base version). It was fine-tuned for Swiss German to Standard German speech translation.
Intended uses & limitations
The model was build as a next step of my Master's thesis [5], where I fine-tuned XLS-R on varying data compositions for Swiss German to Standard German Speech-to-Text translation. As the models were trained on SDS-200 and STT4SG-350, datasets with very strict licences, they cannot be published or used for my personal projects. This model is trained with the goal of using it later as basis for a speech translation app for Alemannic dialects to Standard German.
Training and evaluation data
The datasets are very limited: SPC only contains parliamentary speech form the Berne region of Switzerland, while SwissDial contains read speech from various regions but is very small.
We oversampled the data from SwissDial as it contains more dialectal variability than SPC. This should increase the performance of the model on other dialects than Berne [4][5].
The model was evaluated on the SPC test set [1] and the Zurich (ZH) part of the All Swiss German Dialects Test Set from SwissText 2021 [3]. At the end of training (28k steps or 1.2 epochs), the model had the following evaluation results:
- SPC test: 40.04 BLEU
- ASGD (ZH): 31 BLEU
Training procedure
We trained the model on Modal on a single NVIDIA A10G Tensor Core GPU with 24 GB of memory. After 28k steps, the budget limit was reached and training was stopped. We plan to continue fine-tuning the model at a later point as performance on both validation sets was still increasing. The code will be published on my GitHub once the training procedure is done.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 10
- mixed_precision_training: Native AMP
qLoRa parameters:
- quantization: 8-bit
- r = 32
- alpha = 64
- dropout = 0.05
- bias = None
- target modules =
q_proj,v_proj
Framework versions
- PEFT 0.17.1
- Transformers 4.56.1
- Pytorch 2.8.0+cu128
- Datasets 3.6.0
- Tokenizers 0.22.0
References
[1] Plüss, M., Neukom, L., Scheller, C., & Vogel, M. (2020). Swiss parliaments corpus, an automatically aligned swiss german speech to standard german text corpus. arXiv preprint arXiv:2010.02810.
[2] Dogan-Schönberger, P., Mäder, J., & Hofmann, T. (2021). Swissdial: Parallel multidialectal corpus of spoken swiss german. arXiv preprint arXiv:2103.11401.
[3] Plüss, M., Neukom, L., & Vogel, M. (2021). Swisstext 2021 task 3: Swiss german speech to standard german text. In Proceedings of the Swiss Text Analytics Conference (Vol. 2021).
[4] Paonessa, C., Schraner, Y., Deriu, J. M., Huerlimann, M., Vogel, M., & Cieliebak, M. (2023, December). Dialect Transfer for Swiss German Speech Translation. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 15240-15254).
[5] Bär, M., DeMarco, A., & Labaka, G. (2025, July). Swiss German Speech Translation and the Curse of Multidialectality. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025) (pp. 165-179).
- Downloads last month
- 11
Model tree for kuma-rtin/whisper_swissdial-spc
Base model
openai/whisper-baseDataset used to train kuma-rtin/whisper_swissdial-spc
Evaluation results
- bleu on SPC test setself-reported40.040
- bleu on All Swiss German Dialects Test Set (ZH only)self-reported31.000