Glimpse: Multi-Modal OCR for RTL Scripts (Arabic & Persian)

Glimpse is a high-precision OCR model specialized for Right-to-Left (RTL) languages. It was developed as part of the ERNIE AI Developer Challenge 2025 to solve the "Script Bias" typically found in general-purpose vision models.

Model Details

  • Developed by: surfiniaburger
  • License: apache-2.0
  • Finetuned from model: unsloth/PaddleOCR-VL
  • Architecture: ERNIE-4.5-0.3B Vision-Language Model

πŸ“Š Performance Metrics

After 500 steps of fine-tuning, Glimpse achieved the following results on unseen RTL text lines:

  • Character Error Rate (CER): 6.97% (Reduced from ~59% baseline)
  • Validation Loss: 0.25 (Near-perfect convergence)

πŸ“š Dataset Credit

This model was trained using the Persian & Arabic Text-Line Image OCR Dataset (Medium) curated by Mohammad Reza Hajesmaeili.

πŸ›  Training Tools

This model was trained 2x faster using Unsloth and Huggingface's TRL library. We utilized QLoRA (4-bit) to ensure efficient memory usage while maintaining high-fidelity weights.

Downloads last month
57
Safetensors
Model size
1.0B params
Tensor type
F32
Β·
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for surfiniaburger/unsloth_finetune_ocr_arabic

Finetuned
(10)
this model

Dataset used to train surfiniaburger/unsloth_finetune_ocr_arabic