YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Swahili-English Translation Model (General Domain Expansion)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-mul-en on a large corpus of general Swahili-English translations while maintaining helpline translation quality.

Model Details

  • Base Model: Helsinki-NLP/opus-mt-mul-en
  • Language Pair: Swahili (sw) โ†’ English (en)
  • Training Data:
    • CCAligned general corpus (~200k+ samples)
    • Helpline conversation data (oversampled 5x for domain retention)
  • Special Features:
    • Domain-aware with <HELPLINE> and <GENERAL> tags
    • Optimized for both general and helpline translations
    • Knowledge distillation from helpline-specialized model

Training Procedure

Memory Optimizations

  • CPU teacher offloading
  • Gradient checkpointing
  • Batch size: 8, Gradient accumulation: 16

Training Hyperparameters

  • Learning rate: 1.5e-5
  • Epochs: 1
  • Optimizer: AdamW
  • LR Scheduler: Cosine with warmup

Performance

Domain BLEU chrF
Helpline X.XX XX.X
General X.XX XX.X

(Replace with actual metrics from training)

Usage

from transformers import MarianMTModel, MarianTokenizer

# Load model and tokenizer
model_name = "brendaogutu/sw-en-opus-mt-general-expanded"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# For general translations
text = "<GENERAL> Habari za asubuhi"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)  # "Good morning"

# For helpline translations
text = "<HELPLINE> Ninahitaji msaada wa haraka"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)  # "I need urgent help"

Limitations

  • Optimized for Swahili to English (not bidirectional)
  • Best performance with domain tags ( or )
  • May struggle with very technical or specialized vocabulary outside training domains

Training Details

  • Framework: Transformers + PyTorch
  • Hardware: Single GPU training
  • Training Time: ~X hours
  • Checkpoint Strategy: Every 500 steps for power failure recovery

Citation

If you use this model, please cite:

@misc{{sw-en-general-expanded,
  author = {{Your Name/Organization}},
  title = {{Swahili-English General Domain Translation Model}},
  year = {{2025}},
  publisher = {{HuggingFace}},
  url = {{https://huggingface.co/brendaogutu/sw-en-opus-mt-general-expanded}}
}}

License

This model inherits the license from Helsinki-NLP/opus-mt-mul-en.

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
126
Safetensors
Model size
77.1M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support