YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Swahili-English Translation Model (General Domain Expansion)
This model is a fine-tuned version of Helsinki-NLP/opus-mt-mul-en on a large corpus of general Swahili-English translations while maintaining helpline translation quality.
Model Details
- Base Model: Helsinki-NLP/opus-mt-mul-en
- Language Pair: Swahili (sw) โ English (en)
- Training Data:
- CCAligned general corpus (~200k+ samples)
- Helpline conversation data (oversampled 5x for domain retention)
- Special Features:
- Domain-aware with
<HELPLINE>and<GENERAL>tags - Optimized for both general and helpline translations
- Knowledge distillation from helpline-specialized model
- Domain-aware with
Training Procedure
Memory Optimizations
- CPU teacher offloading
- Gradient checkpointing
- Batch size: 8, Gradient accumulation: 16
Training Hyperparameters
- Learning rate: 1.5e-5
- Epochs: 1
- Optimizer: AdamW
- LR Scheduler: Cosine with warmup
Performance
| Domain | BLEU | chrF |
|---|---|---|
| Helpline | X.XX | XX.X |
| General | X.XX | XX.X |
(Replace with actual metrics from training)
Usage
from transformers import MarianMTModel, MarianTokenizer
# Load model and tokenizer
model_name = "brendaogutu/sw-en-opus-mt-general-expanded"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# For general translations
text = "<GENERAL> Habari za asubuhi"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation) # "Good morning"
# For helpline translations
text = "<HELPLINE> Ninahitaji msaada wa haraka"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation) # "I need urgent help"
Limitations
- Optimized for Swahili to English (not bidirectional)
- Best performance with domain tags ( or )
- May struggle with very technical or specialized vocabulary outside training domains
Training Details
- Framework: Transformers + PyTorch
- Hardware: Single GPU training
- Training Time: ~X hours
- Checkpoint Strategy: Every 500 steps for power failure recovery
Citation
If you use this model, please cite:
@misc{{sw-en-general-expanded,
author = {{Your Name/Organization}},
title = {{Swahili-English General Domain Translation Model}},
year = {{2025}},
publisher = {{HuggingFace}},
url = {{https://huggingface.co/brendaogutu/sw-en-opus-mt-general-expanded}}
}}
License
This model inherits the license from Helsinki-NLP/opus-mt-mul-en.
Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 126
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support