# Swahili-English Translation Model (General Domain Expansion) This model is a fine-tuned version of [Helsinki-NLP/opus-mt-mul-en](https://huggingface.co/Helsinki-NLP/opus-mt-mul-en) on a large corpus of general Swahili-English translations while maintaining helpline translation quality. ## Model Details - **Base Model:** Helsinki-NLP/opus-mt-mul-en - **Language Pair:** Swahili (sw) → English (en) - **Training Data:** - CCAligned general corpus (~200k+ samples) - Helpline conversation data (oversampled 5x for domain retention) - **Special Features:** - Domain-aware with `` and `` tags - Optimized for both general and helpline translations - Knowledge distillation from helpline-specialized model ## Training Procedure ### Memory Optimizations - CPU teacher offloading - Gradient checkpointing - Batch size: 8, Gradient accumulation: 16 ### Training Hyperparameters - Learning rate: 1.5e-5 - Epochs: 1 - Optimizer: AdamW - LR Scheduler: Cosine with warmup ## Performance | Domain | BLEU | chrF | |--------|------|------| | Helpline | X.XX | XX.X | | General | X.XX | XX.X | *(Replace with actual metrics from training)* ## Usage ```python from transformers import MarianMTModel, MarianTokenizer # Load model and tokenizer model_name = "brendaogutu/sw-en-opus-mt-general-expanded" tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name) # For general translations text = " Habari za asubuhi" inputs = tokenizer(text, return_tensors="pt", padding=True) outputs = model.generate(**inputs) translation = tokenizer.decode(outputs[0], skip_special_tokens=True) print(translation) # "Good morning" # For helpline translations text = " Ninahitaji msaada wa haraka" inputs = tokenizer(text, return_tensors="pt", padding=True) outputs = model.generate(**inputs) translation = tokenizer.decode(outputs[0], skip_special_tokens=True) print(translation) # "I need urgent help" ``` ## Limitations - Optimized for Swahili to English (not bidirectional) - Best performance with domain tags ( or ) - May struggle with very technical or specialized vocabulary outside training domains ## Training Details - **Framework:** Transformers + PyTorch - **Hardware:** Single GPU training - **Training Time:** ~X hours - **Checkpoint Strategy:** Every 500 steps for power failure recovery ## Citation If you use this model, please cite: ```bibtex @misc{{sw-en-general-expanded, author = {{Your Name/Organization}}, title = {{Swahili-English General Domain Translation Model}}, year = {{2025}}, publisher = {{HuggingFace}}, url = {{https://huggingface.co/brendaogutu/sw-en-opus-mt-general-expanded}} }} ``` ## License This model inherits the license from Helsinki-NLP/opus-mt-mul-en. ## Contact For questions or issues, please open an issue on the model repository.