MARBERT Model for Arabic Sentiment Analysis (Positive/Negative)

This is a fine-tuned version of UBC-NLP/MARBERTv2 for Arabic Sentiment Analysis. The model is trained to classify Arabic text (specifically tweets) into two categories: Positive (LABEL_1) or Negative (LABEL_0).

๐Ÿš€ Live Demo

You can test the model live on the Hugging Face Space: https://huggingface.co/spaces/iMeshal/arabic-sentiment-app


๐Ÿ“Š Model Performance

The model was trained on 80% of the training data and validated on 20%. The final evaluation was performed on a separate, unseen test set.

Final Test Set Results (Accuracy: 94.40%)

Metric Score
Accuracy 94.40%
F1 (Macro) 94.40%
Precision (Macro) 94.40%
Recall (Macro) 94.40%
Loss 0.1667

The model achieved its best validation accuracy of 93.4% at Epoch 2, and load_best_model_at_end was used.


๐Ÿ’ป Intended Use (How to use)

You can use this model directly with the transformers pipeline.

from transformers import pipeline

# Load the pipeline
pipe = pipeline(
    "sentiment-analysis", 
    model="iMeshal/arabic-sentiment-classifier-marbert"
)

# Test with new texts
texts = [
    "ู‡ุฐุง ุงู„ู…ู†ุชุฌ ุฑุงุฆุน ุฌุฏุงู‹ ุฃู†ุตุญ ุจู‡",
    "ุฃุณูˆุฃ ุฎุฏู…ุฉ ุนู…ู„ุงุก ุนู„ู‰ ุงู„ุฅุทู„ุงู‚",
    "ุงู„ุฌูˆ ุงู„ูŠูˆู… ุฌู…ูŠู„"
]

results = pipe(texts)
print(results)
# Output:
# [
#   {'label': 'LABEL_1', 'score': 0.99...}, # Positive
#   {'label': 'LABEL_0', 'score': 0.99...}, # Negative
#   {'label': 'LABEL_1', 'score': 0.98...}  # Positive
# ]

๐Ÿ“š Training Data

The model was trained on the Arabic Sentiment Twitter Corpus dataset from Kaggle.

  • Preprocessing: Long/concatenated tweets (which appeared to be noise) were cleaned.
  • Training Set: ~24,163 samples.
  • Validation Set: ~6,041 samples.
  • Test Set: ~11,508 samples.
  • Balance: All datasets were perfectly balanced (approx. 50% Positive / 50% Negative).

โš™๏ธ Training Procedure

The model was trained using the transformers.Trainer class with the following key hyperparameters:

  • Framework: PyTorch
  • Base Model: UBC-NLP/MARBERTv2
  • Epochs: 3 (with Early Stopping)
  • Early Stopping: Patience set to 2 (training stopped at Epoch 3, but Epoch 2 was the best).
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Tokenizer: AutoTokenizer (with padding="max_length", truncation=True, max_length=512)

๐Ÿ“ž Contact

Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using iMeshal/arabic-sentiment-classifier-marbert 1