Automatic Speech Recognition
PEFT
TensorBoard
Safetensors
Generated from Trainer
File size: 2,803 Bytes
563c265
 
0988565
 
 
 
 
 
 
 
 
 
 
 
563c265
 
0988565
 
563c265
0988565
563c265
0988565
 
 
1efcb31
563c265
0988565
563c265
 
0988565
563c265
0988565
 
 
563c265
0988565
 
563c265
0988565
 
 
563c265
0988565
563c265
0988565
 
563c265
0988565
563c265
0988565
563c265
0988565
563c265
0988565
563c265
0988565
563c265
0988565
563c265
0988565
563c265
0988565
563c265
0988565
563c265
a03cb4c
 
 
 
 
 
 
563c265
 
 
 
0988565
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
library_name: peft
license: apache-2.0
base_model: openai/whisper-base
tags:
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
- tunis-ai/arabic_speech_corpus
- THCHS-30
model-index:
- name: lowhipa-base-comb
  results: []
pipeline_tag: automatic-speech-recognition
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# lowhipa-base-comb

This Whisper-for-IPA (WhIPA) model adapter is a PEFT LoRA fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on a subset of:
- CommonVoice11 dataset (1k samples each from Greek, Finnish, Hungarian, Japanese, Maltese, Polish, Tamil) with G2P-based IPA transcriptions
- Mandarin THCHS-30 database (https://arxiv.org/pdf/1512.01882) with IPA transcriptions by Taubert (2023, https://zenodo.org/records/7528596) (1k samples)
- Arabic Speech Corpus (https://en.arabicspeechcorpus.com) with custom IPA transcriptions transliterated from the provided Buckwalter transcriptions (1k samples) (https://doi.org/10.5281/zenodo.17111977)

## Model description


For deployment and description, please refer to https://github.com/jshrdt/whipa.

```
from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor
from peft import PeftModel

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base", task="transcribe")
tokenizer.add_special_tokens({"additional_special_tokens": ["<|ip|>"] + tokenizer.all_special_tokens})

base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
base_model.generation_config.lang_to_id["<|ip|>"] = tokenizer.convert_tokens_to_ids(["<|ip|>"])[0]
base_model.resize_token_embeddings(len(tokenizer))

whipa_model = PeftModel.from_pretrained(base_model, "jshrdt/lowhipa-base-comb")

whipa_model.generation_config.language = "<|ip|>"
whipa_model.generation_config.task = "transcribe"

whipa_processor = WhisperProcessor.from_pretrained("openai/whisper-base", task="transcribe")

```

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

### Training results

| Training Loss | Epoch   | Validation Loss |
|:-------------:|:-------:|:---------------:|
| 1.5428        | 2.0323  | 1.2981462478637695          |
| 0.7498        | 4.0645   | 0.8457677960395813          |
| 0.5968        | 6.0968   | 0.759925901889801          |
| 0.5156        | 8.1290   | 0.7213243246078491          |
| 0.4603        | 10.1613  | 0.7064764499664307          |


### Framework versions

- PEFT 0.15.1
- Transformers 4.48.3
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0