DIMI Arabic OCR v2

Accurate Arabic OCR model V2 for extracting printed Arabic text from images

Model Description

DIMI Arabic OCR v2 is a specialized Arabic Optical Character Recognition model fine-tuned on Qwen2.5-VL-7B-Instruct using LoRA adapters. This is the second iteration, building upon v1 with improved diacritics handling and enhanced accuracy across diverse Arabic text scenarios.

Developed by: Ahmed Zaky
Base Model: AhmedZaky1/DIMI-Arabic-OCR (v1)
Original Base: Qwen/Qwen2.5-VL-7B-Instruct
Model Type: Vision-Language Model (VLM) for Arabic OCR
Language: Arabic (ar)
License: Apache 2.0
Fine-tuning Method: LoRA (Low-Rank Adaptation) with 4-bit quantization

Key Improvements Over v1

✅ 30% reduction in WER on diacritics-heavy text
✅ Enhanced training dataset with balanced diacritics representation
✅ Improved generalization across news articles and formal documents
✅ Better preservation of text formatting and structure

📊 Performance Metrics

Test Set Results (500 samples from 2,600)

Metric	Score	Description
WER	0.3049	Word Error Rate (↓ lower is better)
CER	0.1119	Character Error Rate (↓ lower is better)
Perfect Predictions	23%	Exact matches with ground truth

Validation Set Results (100 samples)

Metric	Score
WER	0.2315
CER	0.0776

Comparison with v1

Model	Test WER	Test CER	Val WER	Val CER
v1	0.404	0.226	0.3308	0.1820
v2	0.3049 ↓	0.1119 ↓	0.2315	0.0776

Improvements:

WER reduced by ~24.5% (0.404 → 0.3049)
CER reduced by ~50.5% (0.226 → 0.1119)

🎯 Intended Use

Direct Use

This model is designed for extracting Arabic text from images, including:

📰 News articles and printed documents
📝 Formal Arabic text with diacritics (تشكيل)
🔢 Mixed Arabic text and numbers
📄 Scanned documents and screenshots

Example Use Case

from unsloth import FastVisionModel
from PIL import Image
import torch

# Load model
model, tokenizer = FastVisionModel.from_pretrained(
    "AhmedZaky1/DIMI-Arabic-OCR-v2",
    load_in_4bit=True,
    device_map="auto"
)
FastVisionModel.for_inference(model)

# Load image
image = Image.open("arabic_document.jpg")

# Prepare prompt
instruction = "استخرج النص العربي والأرقام الموجودة في هذه الصورة بدقة عالية."

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": instruction},
        ],
    }
]

# Apply chat template
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

# Tokenize
inputs = tokenizer(
    text=[text],
    images=[image],
    padding=True,
    return_tensors="pt",
    truncation=False
).to("cuda")

# Generate
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=False
    )

# Decode
generated_ids = [
    out[len(inp):] for inp, out in zip(inputs.input_ids, outputs)
]
prediction = tokenizer.batch_decode(
    generated_ids, 
    skip_special_tokens=True
)[0]

print(prediction)

🧾 Training Data

Fine-tuned on 11,000 Arabic text images combining:

The dataset covers modern standard Arabic with and without diacritics.

📚 Citation

If you use this model, please cite:

@misc{dimi-arabic-ocr-2025,
  author = {Ahmed Zaky},
  title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}}
}

🔗 Related Projects

DIMI Models Series — Arabic Vision & Language Models

Built with ❤️ by Ahmed Zaky

Advancing Arabic NLP through state-of-the-art embedding models

Downloads last month: 250

Model tree for AhmedZaky1/DIMI-Arabic-OCR-V2

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Quantized

unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit

Adapter

AhmedZaky1/DIMI-Arabic-OCR