Glimpse: Multi-Modal OCR for RTL Scripts (Arabic & Persian)
Glimpse is a high-precision OCR model specialized for Right-to-Left (RTL) languages. It was developed as part of the ERNIE AI Developer Challenge 2025 to solve the "Script Bias" typically found in general-purpose vision models.
Model Details
- Developed by: surfiniaburger
- License: apache-2.0
- Finetuned from model: unsloth/PaddleOCR-VL
- Architecture: ERNIE-4.5-0.3B Vision-Language Model
π Performance Metrics
After 500 steps of fine-tuning, Glimpse achieved the following results on unseen RTL text lines:
- Character Error Rate (CER): 6.97% (Reduced from ~59% baseline)
- Validation Loss: 0.25 (Near-perfect convergence)
π Dataset Credit
This model was trained using the Persian & Arabic Text-Line Image OCR Dataset (Medium) curated by Mohammad Reza Hajesmaeili.
- Dataset Source: mohajesmaeili/Persian_Arabic_TextLine_Image_Ocr_Medium
- Content: Diverse text-line images containing complex ligatures and various Arabic/Persian fonts.
π Training Tools
This model was trained 2x faster using Unsloth and Huggingface's TRL library. We utilized QLoRA (4-bit) to ensure efficient memory usage while maintaining high-fidelity weights.
- Downloads last month
- 57
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
