DIMI Arabic OCR v2

Accurate Arabic OCR model V2 for extracting printed Arabic text from images


Model Description

DIMI Arabic OCR v2 is a specialized Arabic Optical Character Recognition model fine-tuned on Qwen2.5-VL-7B-Instruct using LoRA adapters. This is the second iteration, building upon v1 with improved diacritics handling and enhanced accuracy across diverse Arabic text scenarios.

  • Developed by: Ahmed Zaky
  • Base Model: AhmedZaky1/DIMI-Arabic-OCR (v1)
  • Original Base: Qwen/Qwen2.5-VL-7B-Instruct
  • Model Type: Vision-Language Model (VLM) for Arabic OCR
  • Language: Arabic (ar)
  • License: Apache 2.0
  • Fine-tuning Method: LoRA (Low-Rank Adaptation) with 4-bit quantization

Key Improvements Over v1

30% reduction in WER on diacritics-heavy text
Enhanced training dataset with balanced diacritics representation
Improved generalization across news articles and formal documents
Better preservation of text formatting and structure

📊 Performance Metrics

Test Set Results (500 samples from 2,600)

Metric Score Description
WER 0.3049 Word Error Rate (↓ lower is better)
CER 0.1119 Character Error Rate (↓ lower is better)
Perfect Predictions 23% Exact matches with ground truth

Validation Set Results (100 samples)

Metric Score
WER 0.2315
CER 0.0776

Comparison with v1

Model Test WER Test CER Val WER Val CER
v1 0.404 0.226 0.3308 0.1820
v2 0.3049 0.1119 0.2315 0.0776

Improvements:

  • WER reduced by ~24.5% (0.404 → 0.3049)
  • CER reduced by ~50.5% (0.226 → 0.1119)

🎯 Intended Use

Direct Use

This model is designed for extracting Arabic text from images, including:

  • 📰 News articles and printed documents
  • 📝 Formal Arabic text with diacritics (تشكيل)
  • 🔢 Mixed Arabic text and numbers
  • 📄 Scanned documents and screenshots

Example Use Case

from unsloth import FastVisionModel
from PIL import Image
import torch

# Load model
model, tokenizer = FastVisionModel.from_pretrained(
    "AhmedZaky1/DIMI-Arabic-OCR-v2",
    load_in_4bit=True,
    device_map="auto"
)
FastVisionModel.for_inference(model)

# Load image
image = Image.open("arabic_document.jpg")

# Prepare prompt
instruction = "استخرج النص العربي والأرقام الموجودة في هذه الصورة بدقة عالية."

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": instruction},
        ],
    }
]

# Apply chat template
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

# Tokenize
inputs = tokenizer(
    text=[text],
    images=[image],
    padding=True,
    return_tensors="pt",
    truncation=False
).to("cuda")

# Generate
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=False
    )

# Decode
generated_ids = [
    out[len(inp):] for inp, out in zip(inputs.input_ids, outputs)
]
prediction = tokenizer.batch_decode(
    generated_ids, 
    skip_special_tokens=True
)[0]

print(prediction)

🧾 Training Data

Fine-tuned on 11,000 Arabic text images combining:

  1. oddadmix/qari-0.2.2-news-dataset-large
  2. oddadmix/qari-0.2.2-diacritics-dataset-large

The dataset covers modern standard Arabic with and without diacritics.


📚 Citation

If you use this model, please cite:

@misc{dimi-arabic-ocr-2025,
  author = {Ahmed Zaky},
  title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}}
}

🔗 Related Projects


Built with ❤️ by Ahmed Zaky

Advancing Arabic NLP through state-of-the-art embedding models

Downloads last month
250
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AhmedZaky1/DIMI-Arabic-OCR-V2

Datasets used to train AhmedZaky1/DIMI-Arabic-OCR-V2