Arabic Medical Classification Model (AraBERTv2)

This model classifies Arabic medical questions and answers into one of 89 medical categories such as Cardiology, Neurology, Endocrinology, etc.

It is fine-tuned on a private dataset of 800,000 Arabic (question, answer, classification) pairs to help in building Arabic healthcare QA systems, virtual assistants, and medical chatbots.


Model Details

Property Value
Base model aubmindlab/bert-base-arabertv2
Language Arabic (Modern Standard Arabic + some dialect)
Task Text Classification
Number of classes 89
Training samples ~800,000
Evaluation metrics Accuracy, F1 Macro, F1 Weighted
Hardware used 1 × NVIDIA GPU (CUDA 12.8)
Framework Transformers + PyTorch

⚙️ How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

model_id = "YourUsername/arabic-medical-classifier-arabertv2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

question = "ما هي أعراض ارتفاع ضغط الدم؟"
answer = "من أهم الأعراض الصداع والدوخة والنزيف الأنفي أحيانًا."
text = question + tokenizer.sep_token + answer

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    probs = F.softmax(outputs.logits, dim=-1)
    pred_id = torch.argmax(probs, dim=-1).item()

print("Predicted class:", model.config.id2label[pred_id])
print("Confidence:", probs[0][pred_id].item())
Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using AymanElbery/arabic-medical-classifier-arabertv2 1