DeepSeek-OCR Dhivehi
This model is a fine-tuned version of unsloth/DeepSeek-OCR, trained for Dhivehi single-line sentence recognition. It was fine-tuned using 50,000 samples from the alakxender/dhivehi-vrd-images dataset.
- Base model: unsloth/DeepSeek-OCR
- More info on the model: deepseek-ai/DeepSeek-OCR
- Dataset: alakxender/vrd-images-224x224
- Samples used: 20k multi-line Dhivehi sentences
Usage
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8:
from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'alakxender/deepseek-ocr-3b-vrd-dhivehi-20k-ml'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)
# prompt = "<image>\nFree OCR. "
prompt = "<image>\nFree OCR. "
image_file = 'sl.png'
output_path = 'your/output/dir'
# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):
# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False
# Gundam: base_size = 1024, image_size = 640, crop_mode = True
res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
- Downloads last month
- 10