MARBERT Model for Arabic Sentiment Analysis (Positive/Negative)

This is a fine-tuned version of UBC-NLP/MARBERTv2 for Arabic Sentiment Analysis. The model is trained to classify Arabic text (specifically tweets) into two categories: Positive (LABEL_1) or Negative (LABEL_0).

🚀 Live Demo

You can test the model live on the Hugging Face Space: https://huggingface.co/spaces/iMeshal/arabic-sentiment-app

📊 Model Performance

The model was trained on 80% of the training data and validated on 20%. The final evaluation was performed on a separate, unseen test set.

Final Test Set Results (Accuracy: 94.40%)

Metric	Score
Accuracy	94.40%
F1 (Macro)	94.40%
Precision (Macro)	94.40%
Recall (Macro)	94.40%
Loss	0.1667

The model achieved its best validation accuracy of 93.4% at Epoch 2, and load_best_model_at_end was used.

💻 Intended Use (How to use)

You can use this model directly with the transformers pipeline.

from transformers import pipeline

# Load the pipeline
pipe = pipeline(
    "sentiment-analysis", 
    model="iMeshal/arabic-sentiment-classifier-marbert"
)

# Test with new texts
texts = [
    "هذا المنتج رائع جداً أنصح به",
    "أسوأ خدمة عملاء على الإطلاق",
    "الجو اليوم جميل"
]

results = pipe(texts)
print(results)
# Output:
# [
#   {'label': 'LABEL_1', 'score': 0.99...}, # Positive
#   {'label': 'LABEL_0', 'score': 0.99...}, # Negative
#   {'label': 'LABEL_1', 'score': 0.98...}  # Positive
# ]

📚 Training Data

The model was trained on the Arabic Sentiment Twitter Corpus dataset from Kaggle.

Preprocessing: Long/concatenated tweets (which appeared to be noise) were cleaned.
Training Set: ~24,163 samples.
Validation Set: ~6,041 samples.
Test Set: ~11,508 samples.
Balance: All datasets were perfectly balanced (approx. 50% Positive / 50% Negative).

⚙️ Training Procedure

The model was trained using the transformers.Trainer class with the following key hyperparameters:

Framework: PyTorch
Base Model: UBC-NLP/MARBERTv2
Epochs: 3 (with Early Stopping)
Early Stopping: Patience set to 2 (training stopped at Epoch 3, but Epoch 2 was the best).
Batch Size: 16
Learning Rate: 2e-5
Tokenizer: AutoTokenizer (with padding="max_length", truncation=True, max_length=512)

📞 Contact

Name: Meshal AL-Qushaym
Email: meshalqushim@outlook.com
Kaggle: kaggle.com/meshalfalah

Downloads last month: 46

iMeshal
/

arabic-sentiment-classifier-marbert