LFM2-1.2B-RAG Arabic (LoRA Fine-tuned)
Fine-tuned version of LiquidAI/LFM2-1.2B-RAG for Arabic reading comprehension and question answering tasks using LoRA (Low-Rank Adaptation) technique.
๐ Model Description
This model specializes in extractive question answering for Arabic text. It has been fine-tuned using LoRA on the Arabic Reading Comprehension Dataset (ARCD) to improve its ability to answer questions based on provided context in Modern Standard Arabic.
Key Features:
- Optimized for Arabic extractive QA
- Context-based question answering
- Maintains faithfulness to source documents
- Efficient fine-tuning via LoRA (rank=16)
๐ฏ Intended Use
Direct Use
- Arabic question answering systems
- RAG (Retrieval-Augmented Generation) applications for Arabic content
- Information extraction from Arabic documents
- Educational tools for Arabic reading comprehension
Downstream Use
Can be further fine-tuned for:
- Domain-specific QA (medical, legal, financial)
- Multi-turn conversational QA
- Document summarization with Q&A
Out-of-Scope Use
Not recommended for:
- Open-domain question answering without context
- Creative writing or content generation
- Translation tasks
- Code generation
๐ How to Use
Basic Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_id = "azeddinShr/LFM2-1.2B-RAG-ARABIC-LoRA"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Prepare input
context = "ูููู
ูู ู
ุดุฑูุน ุถุฎู
ูู ุดู
ุงู ุบุฑุจ ุงูุณุนูุฏูุฉ ุจุชูููุฉ 500 ู
ููุงุฑ ุฏููุงุฑ."
question = "ู
ุง ูู ุชูููุฉ ู
ุดุฑูุน ูููู
ุ"
prompt = f"ุงุณุชุฎุฏู
ุงูุณูุงู ุงูุชุงูู ููุฅุฌุงุจุฉ ุนูู ุงูุณุคุงู:\n\n{context}\n\nุงูุณุคุงู: {question}"
# Generate answer
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
input_ids,
max_new_tokens=150,
temperature=0.0,
do_sample=False,
pad_token_id=tokenizer.eos_token_id
)
answer = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(answer) # Output: 500 ู
ููุงุฑ ุฏููุงุฑ
๐ Training Details
Training Data
- Dataset: hsseinmz/arcd
- Training samples: 693
- Validation samples: 351
- Test samples: 351
- Language: Modern Standard Arabic
- Task: Extractive question answering
Training Procedure
Fine-tuning method: LoRA (Low-Rank Adaptation)
Hyperparameters:
- Base model: LiquidAI/LFM2-1.2B-RAG
- Epochs: 10
- Batch size: 16 (4 per device ร 4 gradient accumulation)
- Learning rate: 2e-4
- Optimizer: AdamW (8-bit paged)
- LR scheduler: Cosine
- Warmup steps: 50
- Weight decay: 0.01
- LoRA rank (r): 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules: w1, w2, w3, q_proj, k_proj, v_proj, out_proj, in_proj
Training infrastructure:
- Precision: bfloat16
- Gradient checkpointing: Enabled
- Framework: Hugging Face Transformers + PEFT + TRL
๐ Ethical Considerations
- This model should not be used for generating misleading or false information
- Users should verify factual claims, especially for sensitive topics
- The model's responses reflect patterns in training data and may not represent complete or unbiased information
๐ Citation
If you use this model in your research or application, please cite:
@misc{lfm2-rag-arabic-lora,
author = {Azeddin sahir},
title = {LFM2-1.2B-RAG Arabic (LoRA Fine-tuned)},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/azeddinShr/lfm2-1.2b-arabic-qa-lora}}
}
๐๐ป Acknowledgments
- Base Model: LiquidAI for LFM2-1.2B-RAG
- Dataset: ARCD - Arabic Reading Comprehension Dataset
- Framework: Hugging Face Transformers, PEFT, TRL
๐ License
Same as based model
๐ง Contact
For questions, issues, or collaboration opportunities, please open an issue in the model repository, contact via Hugging Face, or email me directly at azdinsahir11@gmail.com.
- Downloads last month
- 4