W2V-BERT 2.0 ASR Adapters

This repository contains 14 per-language bottleneck adapters for automatic speech recognition (ASR) trained on top of facebook/w2v-bert-2.0.

Model Description

  • Base Model: facebook/w2v-bert-2.0 (600M parameters, frozen)
  • Adapter Architecture: MMS-style bottleneck adapters (dim=64)
  • Decoder: Lightweight transformer decoder (2 layer)
  • Training: CTC loss with extended vocabulary for double vowels
  • Average WER: 45.13%

Trained Adapters

Adapter Language WER Train Samples
ach_Latn Acholi 22.08% 4825
eng_Latn_salt English (SALT) 99.33% 4804
eng_Latn_tts English (TTS) 99.80% 3030
ful_Latn Fulah 99.02% 2355
kam_Latn Kamba 31.91% 14968
kik_Latn Kikuyu 15.36% 14966
lug_Latn_salt Luganda (SALT) 28.15% 5002
luo_Latn Luo 17.69% 14922
mer_Latn Kimeru 34.70% 14981
nyn_Latn Runyankole 30.46% 4884
swh_Latn_salt Swahili (SALT) 95.23% 3835
swh_Latn_v1 Swahili (Filtered) 20.94% 15000
swh_Latn_v2 Swahili (Bible) 3.31% 10458
teo_Latn Ateso 33.88% 4901

Architecture

The model uses:

  1. Frozen w2v-bert-2.0 encoder - Extracts audio representations
  2. Bottleneck adapter - Language-specific adaptation (trainable)
  3. Lightweight decoder - Transformer decoder block (trainable)
  4. LM head - Per-language vocabulary projection (trainable)
Audio β†’ Encoder(frozen) β†’ Adapter β†’ Decoder β†’ LayerNorm β†’ LM Head β†’ Text

Usage

Each adapter folder contains:

  • adapter_weights.pt - Bottleneck adapter weights
  • decoder_weights.pt - Decoder block weights
  • lm_head_weights.pt - Language model head weights
  • final_norm_weights.pt - Final layer norm weights
  • vocab.json - Language-specific vocabulary
  • adapter_config.json - Adapter configuration
  • metrics.json - Training metrics

Loading an Adapter

import torch
from transformers import Wav2Vec2BertProcessor
from huggingface_hub import hf_hub_download

# Load processor for specific language (e.g., kik_Latn for Kikuyu)
adapter_id = "kik_Latn"
processor = Wav2Vec2BertProcessor.from_pretrained(
    "mutisya/w2v-bert-adapters-14lang-e10-25_52-v6",
    subfolder=adapter_id
)

# Load adapter configuration
import json
config_path = hf_hub_download("mutisya/w2v-bert-adapters-14lang-e10-25_52-v6", f"{adapter_id}/adapter_config.json")
with open(config_path) as f:
    adapter_config = json.load(f)

# Load adapter weights
adapter_weights = torch.load(
    hf_hub_download("mutisya/w2v-bert-adapters-14lang-e10-25_52-v6", f"{adapter_id}/adapter_weights.pt"),
    map_location="cpu"
)
decoder_weights = torch.load(
    hf_hub_download("mutisya/w2v-bert-adapters-14lang-e10-25_52-v6", f"{adapter_id}/decoder_weights.pt"),
    map_location="cpu"
)
lm_head_weights = torch.load(
    hf_hub_download("mutisya/w2v-bert-adapters-14lang-e10-25_52-v6", f"{adapter_id}/lm_head_weights.pt"),
    map_location="cpu"
)

Training Configuration

  • Epochs: 10
  • Learning Rate: 0.0005
  • Batch Size: 48 Γ— 1 (effective: 48)
  • Extended Vocabulary: True
  • Adapter Dimension: 64
  • Decoder Layers: 2

Supported Languages

The following languages have trained adapters:

  • Acholi (ach_Latn): WER 22.08%
  • English (SALT) (eng_Latn_salt): WER 99.33%
  • English (TTS) (eng_Latn_tts): WER 99.80%
  • Fulah (ful_Latn): WER 99.02%
  • Kamba (kam_Latn): WER 31.91%
  • Kikuyu (kik_Latn): WER 15.36%
  • Luganda (SALT) (lug_Latn_salt): WER 28.15%
  • Luo (luo_Latn): WER 17.69%
  • Kimeru (mer_Latn): WER 34.70%
  • Runyankole (nyn_Latn): WER 30.46%
  • Swahili (SALT) (swh_Latn_salt): WER 95.23%
  • Swahili (Filtered) (swh_Latn_v1): WER 20.94%
  • Swahili (Bible) (swh_Latn_v2): WER 3.31%
  • Ateso (teo_Latn): WER 33.88%

License

Apache 2.0

Citation

@misc{w2vbert-asr-adapters,
  author = {Mutisya},
  title = {W2V-BERT 2.0 ASR Adapters for African Languages},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/mutisya/w2v-bert-adapters-14lang-e10-25_52-v6}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mutisya/w2v-bert-adapters-14lang-e10-25_52-v6

Finetuned
(417)
this model

Evaluation results