| | --- |
| | license: gpl-3.0 |
| |
|
| | language: |
| | - be |
| |
|
| | tags: |
| | - audio |
| | - speech |
| | - automatic-speech-recognition |
| |
|
| | datasets: |
| | - mozilla-foundation/common_voice_8_0 |
| |
|
| | metrics: |
| | - wer |
| |
|
| | model-index: |
| | - name: wav2vec2 |
| | results: |
| | - task: |
| | name: Automatic Speech Recognition |
| | type: automatic-speech-recognition |
| | dataset: |
| | name: Common Voice 8 |
| | type: mozilla-foundation/common_voice_8_0 |
| | args: be |
| | metrics: |
| | - name: Dev WER |
| | type: wer |
| | value: 17.61 |
| | - name: Test WER |
| | type: wer |
| | value: 18.7 |
| | - name: Dev WER (with LM) |
| | type: wer |
| | value: 11.5 |
| | - name: Test WER (with LM) |
| | type: wer |
| | value: 12.4 |
| | --- |
| | |
| | # Automatic Speech Recognition for Belarusian language |
| |
|
| | Fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on `mozilla-foundation/common_voice_8_0 be` dataset. |
| |
|
| | `Train`, `Dev`, `Test` splits were used as they are present in the dataset. No additional data was used from `Validated` split, |
| | only 1 voicing of each sentence was used - the way the data was split by [CommonVoice CorporaCreator](https://github.com/common-voice/CorporaCreator). |
| | To build a better model **one can use additional voicings from `Validated` split** for sentences already present in `Train`, `Dev`, `Test` splits, |
| | i.e. enlarge mentioned splits. |
| |
|
| | Language model was built using [KenLM](https://kheafield.com/code/kenlm/estimation/). |
| | 5-gram Language model was built on sentences from `Train + (Other - Dev - Test)` splits of `mozilla-foundation/common_voice_8_0 be` dataset. |
| |
|
| | Source code is available [here](https://github.com/yks72p/stt_be). |
| |
|
| | ## Run model in a browser |
| |
|
| | This page contains interactive demo widget that lets you test this model right in a browser. |
| |
|
| | However, this widget uses Acoustic model only **without** Language model that significantly improves overall performance. |
| |
|
| | You can play with **full pipeline of Acoustic model + Language model** on the following [spaces page](https://huggingface.co/spaces/ales/wav2vec2-cv-be-lm) |
| | (also works from browser). |
| |
|