Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Whisper for Swiss German Dialects

This model is a fine-tuned version of openai/whisper-base on the SPC [1] and SwissDial [2] datasets.

Model description

The model a qLora (8-bit) adapter for OpenAI's Whisper model (base version). It was fine-tuned for Swiss German to Standard German speech translation.

Intended uses & limitations

The model was build as a next step of my Master's thesis [5], where I fine-tuned XLS-R on varying data compositions for Swiss German to Standard German Speech-to-Text translation. As the models were trained on SDS-200 and STT4SG-350, datasets with very strict licences, they cannot be published or used for my personal projects. This model is trained with the goal of using it later as basis for a speech translation app for Alemannic dialects to Standard German.

Training and evaluation data

The datasets are very limited: SPC only contains parliamentary speech form the Berne region of Switzerland, while SwissDial contains read speech from various regions but is very small.

We oversampled the data from SwissDial as it contains more dialectal variability than SPC. This should increase the performance of the model on other dialects than Berne [4][5].

The model was evaluated on the SPC test set [1] and the Zurich (ZH) part of the All Swiss German Dialects Test Set from SwissText 2021 [3]. At the end of training (28k steps or 1.2 epochs), the model had the following evaluation results:

SPC test: 40.04 BLEU
ASGD (ZH): 31 BLEU

Training procedure

We trained the model on Modal on a single NVIDIA A10G Tensor Core GPU with 24 GB of memory. After 28k steps, the budget limit was reached and training was stopped. We plan to continue fine-tuning the model at a later point as performance on both validation sets was still increasing. The code will be published on my GitHub once the training procedure is done.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 10
mixed_precision_training: Native AMP

qLoRa parameters:

quantization: 8-bit
r = 32
alpha = 64
dropout = 0.05
bias = None
target modules = q_proj, v_proj

Framework versions

PEFT 0.17.1
Transformers 4.56.1
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.22.0

References

[1] Plüss, M., Neukom, L., Scheller, C., & Vogel, M. (2020). Swiss parliaments corpus, an automatically aligned swiss german speech to standard german text corpus. arXiv preprint arXiv:2010.02810.

[2] Dogan-Schönberger, P., Mäder, J., & Hofmann, T. (2021). Swissdial: Parallel multidialectal corpus of spoken swiss german. arXiv preprint arXiv:2103.11401.

[3] Plüss, M., Neukom, L., & Vogel, M. (2021). Swisstext 2021 task 3: Swiss german speech to standard german text. In Proceedings of the Swiss Text Analytics Conference (Vol. 2021).

[4] Paonessa, C., Schraner, Y., Deriu, J. M., Huerlimann, M., Vogel, M., & Cieliebak, M. (2023, December). Dialect Transfer for Swiss German Speech Translation. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 15240-15254).

[5] Bär, M., DeMarco, A., & Labaka, G. (2025, July). Swiss German Speech Translation and the Curse of Multidialectality. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025) (pp. 165-179).

Downloads last month: 11

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kuma-rtin/whisper_swissdial-spc

Base model

openai/whisper-base

Adapter

(41)

this model

Dataset used to train kuma-rtin/whisper_swissdial-spc

Evaluation results

bleu on SPC test set
self-reported

40.040
bleu on All Swiss German Dialects Test Set (ZH only)
self-reported

31.000