How Much GPU Memory (VRAM) is Needed to Run This Model?

by Vishva007 - opened Nov 5, 2025

Nov 5, 2025

I'm deploying the datalab-to/chandra OCR model for document processing and need to understand GPU memory requirements for inference.

Key Questions:

What's the minimum GPU VRAM needed for FP16/BF16 inference?

Does the model support quantization (AWQ, GPTQ, 8-bit)?

How does memory scale with variable document resolution?

Any real-world experience running on RTX 4090, A100, or consumer GPUs?

Context:

Python transformers library via HF

Docker with NVIDIA CUDA support

Batch document processing optimization

Any deployment insights appreciated!

LastXuanShen42

Nov 21, 2025

•

edited Nov 21, 2025

Here are my test results:
Chandra method: HF
GPU: RTX 5090 32GB
Memory usage: around 18.9 GB (utilization 30–50%)
PDF: 40 MB, 340 pages
Speed: roughly 2–3 minutes per page
Just for reference.

Vishva007

Nov 26, 2025

Thanks a lot @LastXuanShen42 ! That's exactly the data I needed. Appreciate you sharing the benchmarks!

Oblivion07

Dec 21, 2025

Can you fellows help me in this. I'm trying to run chandra on kaggle. But to run the ocr model even on just one image it is demanding for 36GB of gpu space. I'm unable to do anything about this but I really need this model. Can you help me with this?

PreyumKr

17 days ago

Can you fellows help me in this. I'm trying to run chandra on kaggle. But to run the ocr model even on just one image it is demanding for 36GB of gpu space. I'm unable to do anything about this but I really need this model. Can you help me with this?

Same issue is faced by us when we tried using the model. Please anyone help.

skhadloya

3 days ago

Can you fellows help me in this. I'm trying to run chandra on kaggle. But to run the ocr model even on just one image it is demanding for 36GB of gpu space. I'm unable to do anything about this but I really need this model. Can you help me with this?

Same issue is faced by us when we tried using the model. Please anyone help.

Why dont you use some lightweight model instead(docling or paddle)?

tarun-menta

Datalab org 3 days ago

Hi folks - You will need 18GB+ memory to run this model unquantized. Having flash attention installed helps bring down memory usage, using via vllm is best

For GPU-constrained setups, you can use our other model stack - marker. Another option is to use our hosted API at https://www.datalab.to/
@Oblivion07 @PreyumKr

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment