DeepSeek-OCR Inference Endpoint
A minimal, production-ready setup for deploying DeepSeek-OCR as a HuggingFace Inference Endpoint API.
What This Is
This folder contains everything you need to deploy DeepSeek-OCR to HuggingFace Inference Endpoints:
handler.py- Custom inference handler for HuggingFacerequirements.txt- Python dependenciesdeploy.py- Full deployment script (first time setup)update.py- Quick update script (fast iterations)test_endpoint.py- Comprehensive test suitequick_test.py- Simple quick test script- This README with API documentation
Quick Start
1. Install HuggingFace CLI
pip install huggingface_hub
huggingface-cli login
2. Deploy to HuggingFace
First time deployment:
python deploy.py YOUR_USERNAME
This creates a repository at https://huggingface.co/YOUR_USERNAME/deepseek-ocr-inference
Quick updates (after initial deployment):
python update.py YOUR_USERNAME
⚡ Much faster! Only uploads your local files (handler.py, requirements.txt) without copying from the source repo.
When to use which script:
| Script | Use When | Speed | What it Does |
|---|---|---|---|
deploy.py |
First time setup | ~2-3 min | Creates repo, copies config files from source, uploads your files |
update.py |
Updating code/requirements | ~5-10 sec | Only uploads your local files |
Pro tip: After initial deployment with deploy.py, use update.py for all future updates!
3. Create Inference Endpoint
- Go to https://huggingface.co/YOUR_USERNAME/deepseek-ocr-inference
- Click "Deploy" → "Inference Endpoints"
- Choose GPU instance (minimum: 1x A10G, recommended: 1x A100)
- Click "Create Endpoint"
- Wait ~5 minutes for deployment
4. Test Your API
Quick Test (easiest):
# Edit quick_test.py with your endpoint URL and token
python quick_test.py
Comprehensive Test Suite:
python test_endpoint.py --url YOUR_ENDPOINT_URL --token YOUR_HF_TOKEN --comprehensive
Test with specific image:
python test_endpoint.py --url YOUR_ENDPOINT_URL --token YOUR_HF_TOKEN --image path/to/image.jpg
curl test:
curl https://YOUR_ENDPOINT_URL \
-H "Authorization: Bearer YOUR_HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"inputs": "https://example.com/document.jpg",
"parameters": {
"prompt": "<image>\n<|grounding|>Convert the document to markdown."
}
}'
Testing
Quick Test Script
The easiest way to test your endpoint:
- Open
quick_test.py - Edit the configuration at the top:
ENDPOINT_URL = "https://your-endpoint-url" HF_TOKEN = "hf_your_token" IMAGE_URL = "https://example.com/image.jpg" # or use LOCAL_IMAGE - Run:
python quick_test.py
Comprehensive Test Suite
For thorough testing with multiple scenarios:
# Run all test cases
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN --comprehensive
# Test with a specific image URL
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN --image-url "https://example.com/doc.jpg"
# Test with a local image
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN --image path/to/document.pdf
# Test with custom prompt
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN \
--image-url "https://example.com/table.png" \
--prompt "<image>\n<|grounding|>Extract tables as markdown."
What the Endpoint Supports
✅ Supported:
- One image per request (URL, base64, or data URI)
- Text prompts to guide OCR behavior
- Custom parameters (base_size, image_size, crop_mode)
❌ Not Supported:
- Multiple images in a single request (send separate requests)
- Batch processing (process images sequentially)
Example Prompts
The endpoint accepts custom text prompts to guide OCR behavior:
# Default - Markdown conversion
"<image>\n<|grounding|>Convert the document to markdown."
# Extract tables
"<image>\n<|grounding|>Extract all tables as markdown tables."
# Plain text only
"<image>\n<|grounding|>Extract only the text without formatting."
# Form extraction
"<image>\n<|grounding|>Extract form fields and their values."
# Structured extraction
"<image>\n<|grounding|>Identify titles, headers, and body text."
# Multilingual
"<image>\n<|grounding|>Extract text in original language."
API Documentation
Endpoint
POST https://YOUR_ENDPOINT_URL
Headers
Authorization: Bearer YOUR_HF_TOKEN
Content-Type: application/json
Request Body
{
"inputs": "IMAGE_INPUT",
"parameters": {
"prompt": "CUSTOM_PROMPT",
"base_size": 1024,
"image_size": 640,
"crop_mode": true,
"save_results": false,
"test_compress": false
}
}
Field Descriptions
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
inputs |
string | Yes | - | Base64 encoded image, image URL, or data URI |
parameters.prompt |
string | No | "<image>\n<|grounding|>Convert the document to markdown. " |
Custom OCR prompt |
parameters.base_size |
int | No | 1024 | Base image size for processing |
parameters.image_size |
int | No | 640 | Crop image size |
parameters.crop_mode |
bool | No | true | Whether to use crop mode |
parameters.save_results |
bool | No | false | Save detailed results |
parameters.test_compress |
bool | No | false | Test compression |
Response
[
{
"text": "# Document Title\n\nExtracted markdown content..."
}
]
Usage Examples
Example 1: URL Input
import requests
url = "https://YOUR_ENDPOINT_URL"
headers = {
"Authorization": "Bearer YOUR_HF_TOKEN",
"Content-Type": "application/json"
}
payload = {
"inputs": "https://example.com/invoice.pdf",
"parameters": {
"prompt": "<image>\n<|grounding|>Extract all text from this invoice."
}
}
response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result[0]["text"])
Example 2: Base64 Image
import base64
import requests
# Read and encode image
with open("document.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
url = "https://YOUR_ENDPOINT_URL"
headers = {
"Authorization": "Bearer YOUR_HF_TOKEN",
"Content-Type": "application/json"
}
payload = {
"inputs": image_data,
"parameters": {
"prompt": "<image>\n<|grounding|>Convert the document to markdown.",
"base_size": 1024,
"crop_mode": True
}
}
response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result[0]["text"])
Example 3: Data URI
import requests
payload = {
"inputs": "...",
"parameters": {
"prompt": "<image>\n<|grounding|>Extract tables and text."
}
}
response = requests.post(
"https://YOUR_ENDPOINT_URL",
headers={
"Authorization": "Bearer YOUR_HF_TOKEN",
"Content-Type": "application/json"
},
json=payload
)
print(response.json()[0]["text"])
Example 4: Custom Prompts
# Table extraction
payload = {
"inputs": "https://example.com/table.png",
"parameters": {
"prompt": "<image>\n<|grounding|>Extract all tables as markdown tables."
}
}
# Form extraction
payload = {
"inputs": "https://example.com/form.jpg",
"parameters": {
"prompt": "<image>\n<|grounding|>Extract form fields and values as JSON."
}
}
# Multilingual OCR
payload = {
"inputs": "https://example.com/chinese.jpg",
"parameters": {
"prompt": "<image>\n<|grounding|>Extract text in original language."
}
}
JavaScript/TypeScript Example
const endpoint = "https://YOUR_ENDPOINT_URL";
const token = "YOUR_HF_TOKEN";
async function ocr(imageUrl: string): Promise<string> {
const response = await fetch(endpoint, {
method: "POST",
headers: {
"Authorization": `Bearer ${token}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
inputs: imageUrl,
parameters: {
prompt: "<image>\n<|grounding|>Convert the document to markdown.",
},
}),
});
const result = await response.json();
return result[0].text;
}
// Usage
const text = await ocr("https://example.com/document.jpg");
console.log(text);
Cost & Performance
Recommended GPU Instance
- Minimum: 1x NVIDIA A10G (24GB VRAM)
- Recommended: 1x NVIDIA A100 (40GB VRAM)
- Cold start: ~30-60 seconds
- Inference time: ~2-5 seconds per image
Pricing (approximate)
- A10G:
$1.00/hour ($720/month for dedicated) - A100:
$3.00/hour ($2,160/month for dedicated) - Autoscaling: Pay only when processing requests
Troubleshooting
Error: "Model not found"
Make sure you've run deploy.py and the model weights are uploaded.
Error: "Out of memory"
Try using a smaller base_size or upgrade to a larger GPU instance.
Slow inference
- Enable Flash Attention (already configured in handler)
- Use A100 instead of A10G
- Reduce
base_sizefor smaller documents
Handler not loading
Check that handler.py exists in your repository root and trust_remote_code=True is enabled in endpoint settings.
Advanced Configuration
Custom Model Path
Edit handler.py line 19-22 to use a different model:
self.tokenizer = AutoTokenizer.from_pretrained(
"deepseek-ai/DeepSeek-OCR", # Change this
trust_remote_code=True
)
Optimize for Batch Processing
The current handler processes one image at a time. For batch processing, modify the __call__ method to handle lists.
Support
- Model: DeepSeek-OCR
- HuggingFace Docs: https://huggingface.co/docs/inference-endpoints
- Issues: Create an issue in your repository
License
Follows DeepSeek-OCR's original license terms.
- Downloads last month
- 14