YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DeepSeek-OCR Inference Endpoint

A minimal, production-ready setup for deploying DeepSeek-OCR as a HuggingFace Inference Endpoint API.

What This Is

This folder contains everything you need to deploy DeepSeek-OCR to HuggingFace Inference Endpoints:

handler.py - Custom inference handler for HuggingFace
requirements.txt - Python dependencies
deploy.py - Full deployment script (first time setup)
update.py - Quick update script (fast iterations)
test_endpoint.py - Comprehensive test suite
quick_test.py - Simple quick test script
This README with API documentation

Quick Start

1. Install HuggingFace CLI

pip install huggingface_hub
huggingface-cli login

2. Deploy to HuggingFace

First time deployment:

python deploy.py YOUR_USERNAME

This creates a repository at https://huggingface.co/YOUR_USERNAME/deepseek-ocr-inference

Quick updates (after initial deployment):

python update.py YOUR_USERNAME

⚡ Much faster! Only uploads your local files (handler.py, requirements.txt) without copying from the source repo.

When to use which script:

Script	Use When	Speed	What it Does
`deploy.py`	First time setup	~2-3 min	Creates repo, copies config files from source, uploads your files
`update.py`	Updating code/requirements	~5-10 sec	Only uploads your local files

Pro tip: After initial deployment with deploy.py, use update.py for all future updates!

3. Create Inference Endpoint

Go to https://huggingface.co/YOUR_USERNAME/deepseek-ocr-inference
Click "Deploy" → "Inference Endpoints"
Choose GPU instance (minimum: 1x A10G, recommended: 1x A100)
Click "Create Endpoint"
Wait ~5 minutes for deployment

4. Test Your API

Quick Test (easiest):

# Edit quick_test.py with your endpoint URL and token
python quick_test.py

Comprehensive Test Suite:

python test_endpoint.py --url YOUR_ENDPOINT_URL --token YOUR_HF_TOKEN --comprehensive

Test with specific image:

python test_endpoint.py --url YOUR_ENDPOINT_URL --token YOUR_HF_TOKEN --image path/to/image.jpg

curl test:

curl https://YOUR_ENDPOINT_URL \
  -H "Authorization: Bearer YOUR_HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": "https://example.com/document.jpg",
    "parameters": {
      "prompt": "<image>\n<|grounding|>Convert the document to markdown."
    }
  }'

Testing

Quick Test Script

The easiest way to test your endpoint:

Open quick_test.py

Edit the configuration at the top:

ENDPOINT_URL = "https://your-endpoint-url"
HF_TOKEN = "hf_your_token"
IMAGE_URL = "https://example.com/image.jpg"  # or use LOCAL_IMAGE

Run: python quick_test.py

Comprehensive Test Suite

For thorough testing with multiple scenarios:

# Run all test cases
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN --comprehensive

# Test with a specific image URL
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN --image-url "https://example.com/doc.jpg"

# Test with a local image
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN --image path/to/document.pdf

# Test with custom prompt
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN \
  --image-url "https://example.com/table.png" \
  --prompt "<image>\n<|grounding|>Extract tables as markdown."

What the Endpoint Supports

✅ Supported:

One image per request (URL, base64, or data URI)
Text prompts to guide OCR behavior
Custom parameters (base_size, image_size, crop_mode)

❌ Not Supported:

Multiple images in a single request (send separate requests)
Batch processing (process images sequentially)

Example Prompts

The endpoint accepts custom text prompts to guide OCR behavior:

# Default - Markdown conversion
"<image>\n<|grounding|>Convert the document to markdown."

# Extract tables
"<image>\n<|grounding|>Extract all tables as markdown tables."

# Plain text only
"<image>\n<|grounding|>Extract only the text without formatting."

# Form extraction
"<image>\n<|grounding|>Extract form fields and their values."

# Structured extraction
"<image>\n<|grounding|>Identify titles, headers, and body text."

# Multilingual
"<image>\n<|grounding|>Extract text in original language."

API Documentation

Endpoint

POST https://YOUR_ENDPOINT_URL

Headers

Authorization: Bearer YOUR_HF_TOKEN
Content-Type: application/json

Request Body

{
  "inputs": "IMAGE_INPUT",
  "parameters": {
    "prompt": "CUSTOM_PROMPT",
    "base_size": 1024,
    "image_size": 640,
    "crop_mode": true,
    "save_results": false,
    "test_compress": false
  }
}

Field Descriptions

Field	Type	Required	Default	Description
`inputs`	string	Yes	-	Base64 encoded image, image URL, or data URI
`parameters.prompt`	string	No	`"<image>\n<\|grounding\|>Convert the document to markdown. "`	Custom OCR prompt
`parameters.base_size`	int	No	1024	Base image size for processing
`parameters.image_size`	int	No	640	Crop image size
`parameters.crop_mode`	bool	No	true	Whether to use crop mode
`parameters.save_results`	bool	No	false	Save detailed results
`parameters.test_compress`	bool	No	false	Test compression

Response

[
  {
    "text": "# Document Title\n\nExtracted markdown content..."
  }
]

Usage Examples

Example 1: URL Input

import requests

url = "https://YOUR_ENDPOINT_URL"
headers = {
    "Authorization": "Bearer YOUR_HF_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "inputs": "https://example.com/invoice.pdf",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract all text from this invoice."
    }
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result[0]["text"])

Example 2: Base64 Image

import base64
import requests

# Read and encode image
with open("document.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

url = "https://YOUR_ENDPOINT_URL"
headers = {
    "Authorization": "Bearer YOUR_HF_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "inputs": image_data,
    "parameters": {
        "prompt": "<image>\n<|grounding|>Convert the document to markdown.",
        "base_size": 1024,
        "crop_mode": True
    }
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result[0]["text"])

Example 3: Data URI

import requests

payload = {
    "inputs": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract tables and text."
    }
}

response = requests.post(
    "https://YOUR_ENDPOINT_URL",
    headers={
        "Authorization": "Bearer YOUR_HF_TOKEN",
        "Content-Type": "application/json"
    },
    json=payload
)

print(response.json()[0]["text"])

Example 4: Custom Prompts

# Table extraction
payload = {
    "inputs": "https://example.com/table.png",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract all tables as markdown tables."
    }
}

# Form extraction
payload = {
    "inputs": "https://example.com/form.jpg",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract form fields and values as JSON."
    }
}

# Multilingual OCR
payload = {
    "inputs": "https://example.com/chinese.jpg",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract text in original language."
    }
}

JavaScript/TypeScript Example

const endpoint = "https://YOUR_ENDPOINT_URL";
const token = "YOUR_HF_TOKEN";

async function ocr(imageUrl: string): Promise<string> {
  const response = await fetch(endpoint, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${token}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      inputs: imageUrl,
      parameters: {
        prompt: "<image>\n<|grounding|>Convert the document to markdown.",
      },
    }),
  });

  const result = await response.json();
  return result[0].text;
}

// Usage
const text = await ocr("https://example.com/document.jpg");
console.log(text);

Cost & Performance

Recommended GPU Instance

Minimum: 1x NVIDIA A10G (24GB VRAM)
Recommended: 1x NVIDIA A100 (40GB VRAM)
Cold start: ~30-60 seconds
Inference time: ~2-5 seconds per image

Pricing (approximate)

A10G: ~~$1.00/hour (~~$720/month for dedicated)
A100: ~~$3.00/hour (~~$2,160/month for dedicated)
Autoscaling: Pay only when processing requests

Troubleshooting

Error: "Model not found"

Make sure you've run deploy.py and the model weights are uploaded.

Error: "Out of memory"

Try using a smaller base_size or upgrade to a larger GPU instance.

Slow inference

Enable Flash Attention (already configured in handler)
Use A100 instead of A10G
Reduce base_size for smaller documents

Handler not loading

Check that handler.py exists in your repository root and trust_remote_code=True is enabled in endpoint settings.

Advanced Configuration

Custom Model Path

Edit handler.py line 19-22 to use a different model:

self.tokenizer = AutoTokenizer.from_pretrained(
    "deepseek-ai/DeepSeek-OCR",  # Change this
    trust_remote_code=True
)

Optimize for Batch Processing

The current handler processes one image at a time. For batch processing, modify the __call__ method to handle lists.

Support

Model: DeepSeek-OCR
HuggingFace Docs: https://huggingface.co/docs/inference-endpoints
Issues: Create an issue in your repository

License

Follows DeepSeek-OCR's original license terms.

Downloads last month: 14

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support