YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DeepSeek-OCR Inference Endpoint

A minimal, production-ready setup for deploying DeepSeek-OCR as a HuggingFace Inference Endpoint API.

What This Is

This folder contains everything you need to deploy DeepSeek-OCR to HuggingFace Inference Endpoints:

  • handler.py - Custom inference handler for HuggingFace
  • requirements.txt - Python dependencies
  • deploy.py - Full deployment script (first time setup)
  • update.py - Quick update script (fast iterations)
  • test_endpoint.py - Comprehensive test suite
  • quick_test.py - Simple quick test script
  • This README with API documentation

Quick Start

1. Install HuggingFace CLI

pip install huggingface_hub
huggingface-cli login

2. Deploy to HuggingFace

First time deployment:

python deploy.py YOUR_USERNAME

This creates a repository at https://huggingface.co/YOUR_USERNAME/deepseek-ocr-inference

Quick updates (after initial deployment):

python update.py YOUR_USERNAME

Much faster! Only uploads your local files (handler.py, requirements.txt) without copying from the source repo.

When to use which script:

Script Use When Speed What it Does
deploy.py First time setup ~2-3 min Creates repo, copies config files from source, uploads your files
update.py Updating code/requirements ~5-10 sec Only uploads your local files

Pro tip: After initial deployment with deploy.py, use update.py for all future updates!

3. Create Inference Endpoint

  1. Go to https://huggingface.co/YOUR_USERNAME/deepseek-ocr-inference
  2. Click "Deploy" → "Inference Endpoints"
  3. Choose GPU instance (minimum: 1x A10G, recommended: 1x A100)
  4. Click "Create Endpoint"
  5. Wait ~5 minutes for deployment

4. Test Your API

Quick Test (easiest):

# Edit quick_test.py with your endpoint URL and token
python quick_test.py

Comprehensive Test Suite:

python test_endpoint.py --url YOUR_ENDPOINT_URL --token YOUR_HF_TOKEN --comprehensive

Test with specific image:

python test_endpoint.py --url YOUR_ENDPOINT_URL --token YOUR_HF_TOKEN --image path/to/image.jpg

curl test:

curl https://YOUR_ENDPOINT_URL \
  -H "Authorization: Bearer YOUR_HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": "https://example.com/document.jpg",
    "parameters": {
      "prompt": "<image>\n<|grounding|>Convert the document to markdown."
    }
  }'

Testing

Quick Test Script

The easiest way to test your endpoint:

  1. Open quick_test.py
  2. Edit the configuration at the top:
    ENDPOINT_URL = "https://your-endpoint-url"
    HF_TOKEN = "hf_your_token"
    IMAGE_URL = "https://example.com/image.jpg"  # or use LOCAL_IMAGE
    
  3. Run: python quick_test.py

Comprehensive Test Suite

For thorough testing with multiple scenarios:

# Run all test cases
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN --comprehensive

# Test with a specific image URL
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN --image-url "https://example.com/doc.jpg"

# Test with a local image
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN --image path/to/document.pdf

# Test with custom prompt
python test_endpoint.py --url YOUR_URL --token YOUR_TOKEN \
  --image-url "https://example.com/table.png" \
  --prompt "<image>\n<|grounding|>Extract tables as markdown."

What the Endpoint Supports

Supported:

  • One image per request (URL, base64, or data URI)
  • Text prompts to guide OCR behavior
  • Custom parameters (base_size, image_size, crop_mode)

Not Supported:

  • Multiple images in a single request (send separate requests)
  • Batch processing (process images sequentially)

Example Prompts

The endpoint accepts custom text prompts to guide OCR behavior:

# Default - Markdown conversion
"<image>\n<|grounding|>Convert the document to markdown."

# Extract tables
"<image>\n<|grounding|>Extract all tables as markdown tables."

# Plain text only
"<image>\n<|grounding|>Extract only the text without formatting."

# Form extraction
"<image>\n<|grounding|>Extract form fields and their values."

# Structured extraction
"<image>\n<|grounding|>Identify titles, headers, and body text."

# Multilingual
"<image>\n<|grounding|>Extract text in original language."

API Documentation

Endpoint

POST https://YOUR_ENDPOINT_URL

Headers

Authorization: Bearer YOUR_HF_TOKEN
Content-Type: application/json

Request Body

{
  "inputs": "IMAGE_INPUT",
  "parameters": {
    "prompt": "CUSTOM_PROMPT",
    "base_size": 1024,
    "image_size": 640,
    "crop_mode": true,
    "save_results": false,
    "test_compress": false
  }
}

Field Descriptions

Field Type Required Default Description
inputs string Yes - Base64 encoded image, image URL, or data URI
parameters.prompt string No "<image>\n<|grounding|>Convert the document to markdown. " Custom OCR prompt
parameters.base_size int No 1024 Base image size for processing
parameters.image_size int No 640 Crop image size
parameters.crop_mode bool No true Whether to use crop mode
parameters.save_results bool No false Save detailed results
parameters.test_compress bool No false Test compression

Response

[
  {
    "text": "# Document Title\n\nExtracted markdown content..."
  }
]

Usage Examples

Example 1: URL Input

import requests

url = "https://YOUR_ENDPOINT_URL"
headers = {
    "Authorization": "Bearer YOUR_HF_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "inputs": "https://example.com/invoice.pdf",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract all text from this invoice."
    }
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result[0]["text"])

Example 2: Base64 Image

import base64
import requests

# Read and encode image
with open("document.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

url = "https://YOUR_ENDPOINT_URL"
headers = {
    "Authorization": "Bearer YOUR_HF_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "inputs": image_data,
    "parameters": {
        "prompt": "<image>\n<|grounding|>Convert the document to markdown.",
        "base_size": 1024,
        "crop_mode": True
    }
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result[0]["text"])

Example 3: Data URI

import requests

payload = {
    "inputs": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract tables and text."
    }
}

response = requests.post(
    "https://YOUR_ENDPOINT_URL",
    headers={
        "Authorization": "Bearer YOUR_HF_TOKEN",
        "Content-Type": "application/json"
    },
    json=payload
)

print(response.json()[0]["text"])

Example 4: Custom Prompts

# Table extraction
payload = {
    "inputs": "https://example.com/table.png",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract all tables as markdown tables."
    }
}

# Form extraction
payload = {
    "inputs": "https://example.com/form.jpg",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract form fields and values as JSON."
    }
}

# Multilingual OCR
payload = {
    "inputs": "https://example.com/chinese.jpg",
    "parameters": {
        "prompt": "<image>\n<|grounding|>Extract text in original language."
    }
}

JavaScript/TypeScript Example

const endpoint = "https://YOUR_ENDPOINT_URL";
const token = "YOUR_HF_TOKEN";

async function ocr(imageUrl: string): Promise<string> {
  const response = await fetch(endpoint, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${token}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      inputs: imageUrl,
      parameters: {
        prompt: "<image>\n<|grounding|>Convert the document to markdown.",
      },
    }),
  });

  const result = await response.json();
  return result[0].text;
}

// Usage
const text = await ocr("https://example.com/document.jpg");
console.log(text);

Cost & Performance

Recommended GPU Instance

  • Minimum: 1x NVIDIA A10G (24GB VRAM)
  • Recommended: 1x NVIDIA A100 (40GB VRAM)
  • Cold start: ~30-60 seconds
  • Inference time: ~2-5 seconds per image

Pricing (approximate)

  • A10G: $1.00/hour ($720/month for dedicated)
  • A100: $3.00/hour ($2,160/month for dedicated)
  • Autoscaling: Pay only when processing requests

Troubleshooting

Error: "Model not found"

Make sure you've run deploy.py and the model weights are uploaded.

Error: "Out of memory"

Try using a smaller base_size or upgrade to a larger GPU instance.

Slow inference

  • Enable Flash Attention (already configured in handler)
  • Use A100 instead of A10G
  • Reduce base_size for smaller documents

Handler not loading

Check that handler.py exists in your repository root and trust_remote_code=True is enabled in endpoint settings.

Advanced Configuration

Custom Model Path

Edit handler.py line 19-22 to use a different model:

self.tokenizer = AutoTokenizer.from_pretrained(
    "deepseek-ai/DeepSeek-OCR",  # Change this
    trust_remote_code=True
)

Optimize for Batch Processing

The current handler processes one image at a time. For batch processing, modify the __call__ method to handle lists.

Support

License

Follows DeepSeek-OCR's original license terms.

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support