File size: 2,803 Bytes
563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 1efcb31 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 0988565 563c265 a03cb4c 563c265 0988565 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
library_name: peft
license: apache-2.0
base_model: openai/whisper-base
tags:
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
- tunis-ai/arabic_speech_corpus
- THCHS-30
model-index:
- name: lowhipa-base-comb
results: []
pipeline_tag: automatic-speech-recognition
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# lowhipa-base-comb
This Whisper-for-IPA (WhIPA) model adapter is a PEFT LoRA fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on a subset of:
- CommonVoice11 dataset (1k samples each from Greek, Finnish, Hungarian, Japanese, Maltese, Polish, Tamil) with G2P-based IPA transcriptions
- Mandarin THCHS-30 database (https://arxiv.org/pdf/1512.01882) with IPA transcriptions by Taubert (2023, https://zenodo.org/records/7528596) (1k samples)
- Arabic Speech Corpus (https://en.arabicspeechcorpus.com) with custom IPA transcriptions transliterated from the provided Buckwalter transcriptions (1k samples) (https://doi.org/10.5281/zenodo.17111977)
## Model description
For deployment and description, please refer to https://github.com/jshrdt/whipa.
```
from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor
from peft import PeftModel
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base", task="transcribe")
tokenizer.add_special_tokens({"additional_special_tokens": ["<|ip|>"] + tokenizer.all_special_tokens})
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
base_model.generation_config.lang_to_id["<|ip|>"] = tokenizer.convert_tokens_to_ids(["<|ip|>"])[0]
base_model.resize_token_embeddings(len(tokenizer))
whipa_model = PeftModel.from_pretrained(base_model, "jshrdt/lowhipa-base-comb")
whipa_model.generation_config.language = "<|ip|>"
whipa_model.generation_config.task = "transcribe"
whipa_processor = WhisperProcessor.from_pretrained("openai/whisper-base", task="transcribe")
```
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
### Training results
| Training Loss | Epoch | Validation Loss |
|:-------------:|:-------:|:---------------:|
| 1.5428 | 2.0323 | 1.2981462478637695 |
| 0.7498 | 4.0645 | 0.8457677960395813 |
| 0.5968 | 6.0968 | 0.759925901889801 |
| 0.5156 | 8.1290 | 0.7213243246078491 |
| 0.4603 | 10.1613 | 0.7064764499664307 |
### Framework versions
- PEFT 0.15.1
- Transformers 4.48.3
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0 |