ONNX
text-detection
craft
inference4j

CRAFT Text Detection (MLT 25K) β€” ONNX

ONNX export of CRAFT (Character Region Awareness for Text Detection) trained on SynthText + IC13/IC17 (MLT 25K variant).

Converted for use with inference4j, an inference-only AI library for Java.

Usage with inference4j

try (CraftTextDetector detector = CraftTextDetector.builder().build()) {
    List<TextRegion> regions = detector.detect(Path.of("document.jpg"));
    for (TextRegion r : regions) {
        System.out.printf("Text at [%.0f, %.0f, %.0f, %.0f] (confidence=%.2f)%n",
            r.box().x1(), r.box().y1(), r.box().x2(), r.box().y2(),
            r.confidence());
    }
}

Model Details

Property Value
Architecture VGG16-BN backbone + U-Net decoder
Task Text detection (character-level region + affinity maps)
Training data SynthText + ICDAR 2013/2017 (MLT)
Weights craft_mlt_25k.pth from clovaai/CRAFT-pytorch
ONNX opset 17
Input [batch, 3, height, width] β€” RGB, ImageNet-normalized, dimensions must be multiples of 32
Output: score_map [batch, height/2, width/2, 2] β€” channel 0: region score, channel 1: affinity score
Output: feature_map [batch, 32, height/2, width/2] β€” intermediate features (optional, for refinement)
Dynamic axes Batch, height, and width are dynamic

Preprocessing

  1. Resize maintaining aspect ratio (long side to target size, e.g. 1280)
  2. Round both dimensions to nearest multiple of 32
  3. ImageNet normalization: (pixel / 255 - mean) / std
    • mean = [0.485, 0.456, 0.406]
    • std = [0.229, 0.224, 0.225]
  4. NCHW layout: [1, 3, H, W]

Postprocessing

  1. Combine: clip(region_score + affinity_score, 0, 1)
  2. Binary threshold at low_text_threshold (default 0.4)
  3. Connected component labeling (4-connectivity)
  4. For each component: compute mean region score, filter by text_threshold (default 0.7)
  5. Extract axis-aligned bounding box, scale back to original image coordinates

Original Paper

Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019). Character Region Awareness for Text Detection. CVPR 2019. arXiv:1904.01941

License

The original CRAFT model weights are released under the MIT License by Clova AI Research (NAVER Corp).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for inference4j/craft-mlt-25k