File size: 4,373 Bytes
7f11edb
 
00f69dc
 
 
 
 
 
 
 
 
 
 
 
7f11edb
 
00f69dc
7f11edb
 
00f69dc
 
7f11edb
00f69dc
 
 
7f11edb
00f69dc
7f11edb
00f69dc
 
7f11edb
00f69dc
7f11edb
00f69dc
7f11edb
00f69dc
7f11edb
00f69dc
7f11edb
00f69dc
 
 
 
 
 
7f11edb
00f69dc
7f11edb
00f69dc
 
 
 
7f11edb
00f69dc
7f11edb
00f69dc
 
 
 
7f11edb
00f69dc
7f11edb
00f69dc
 
 
7f11edb
00f69dc
 
 
 
7f11edb
00f69dc
 
 
 
7f11edb
00f69dc
 
 
7f11edb
00f69dc
 
 
7f11edb
00f69dc
7f11edb
00f69dc
7f11edb
00f69dc
 
 
7f11edb
00f69dc
7f11edb
00f69dc
7f11edb
00f69dc
 
7f11edb
00f69dc
7f11edb
00f69dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
library_name: transformers
language:
  - en
  - th
base_model:
  - Qwen/Qwen2.5-VL-3B
tags:
  - OCR
  - vision-language
  - document-understanding
  - multilingual
  - QAT
license: apache-2.0
---

# Typhoon-OCR-1.5-3B-QAT


A quantization-aware trained (QAT) version of [**Typhoon OCR v1.5**](https://huggingface.co/scb10x/typhoon-ocr1.5-2b), designed for robust and efficient on-device vision-language OCR in English and Thai.  
This release maintains strong accuracy while significantly improving performance when running under low-bit quantization (e.g., 4-bit), making it ideal for lightweight environments.

This model is released in **bfloat16** and is intended to be used as the **pre-quantization base** before converting to low-bit formats.  
For the 4-bit model, please use the Ollama build here:  
**https://ollama.com/scb10x/typhoon-ocr1.5-3b**

QAT is applied on top of **Qwen2.5-VL-3B**, enabling improved stability and reduced degradation when deployed below 16-bit precision.

**4-bit Ollama version:** https://ollama.com/scb10x/typhoon-ocr1.5-3b  
**Base FP16 model:** https://huggingface.co/scb10x/typhoon-ocr1.5-2b

**Try our demo available on [Demo](https://ocr.opentyphoon.ai/)**

**Code / Examples available on [Github](https://github.com/scb-10x/typhoon-ocr)**

**Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-ocr-release)**

---

## Highlights
- **Quantization-Aware Training (QAT):** Maintains strong OCR accuracy even under aggressive quantization.  
- **Optimized for On-Device Inference:** Faster and more consistent performance on low-resource hardware.  
- **Enhanced Handwriting & Form Parsing:** Retains the v1.5 improvements in handling handwritten notes, forms, irregular layouts, and structured documents.  
- **Supports Text-Rich & Image-Rich Documents:** Effective on tables, diagrams, annotated pages, charts, receipts, and dense reports.  
- **Thai + English Multilingual OCR:** Trained for reliable extraction across bilingual real-world documents.

---

## Intended Use
This is a **task-specific OCR model** and is intended to be used **only with the provided prompt format**.  
It does **not** include general VQA or safety guardrails.  
Some hallucination may still occur, and users should validate outputs for production scenarios.

---

## Quick Links
- **Demo:** https://ocr.opentyphoon.ai  
- **Code / Examples:** https://github.com/scb-10x/typhoon-ocr  
- **Release Blog:** https://opentyphoon.ai/blog/en/typhoon-ocr-release

---

## Prompting
```python
prompt = """Extract all text from the image.

Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.

Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap any clearly defined visual areas (e.g. charts, diagrams, pictures) in:

<figure>
Describe the image's main elements (people, objects, text), note any contextual clues (place, event, culture), mention visible text and its meaning, provide deeper analysis when relevant (especially for financial charts, graphs, or documents), comment on style or architecture if relevant, then give a concise overall summary. Describe in Thai.
</figure>

- Page Numbers: Wrap page numbers in <page_number>...</page_number> (e.g., <page_number>14</page_number>).
- Checkboxes: Use ☐ for unchecked and ☑ for checked boxes."""
```

---

## Quickstart (Ollama)

```
ollama run scb10x/typhoon-ocr1.5-3b
```

---

## Support & Community

* Twitter: [https://twitter.com/opentyphoon](https://twitter.com/opentyphoon)
* Discord: [https://discord.gg/us5gAYmrxw](https://discord.gg/us5gAYmrxw)

---

## Citation

If you use Typhoon OCR or Typhoon models, please cite:

```
@misc{typhoon2,
  title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models},
  author={Kunat Pipatanakul et al.},
  year={2024},
  eprint={2412.13702},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

@misc{nonesung2025thaiocrbench,
  title={ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai},
  author={Surapon Nonesung et al.},
  year={2025},
  eprint={2511.04479},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
```