Cannot run int8 version on CoreML provider

by pzoltowski - opened May 6

May 6

I'm doing some benchmarking of dfine model on my macbook m2 max (latest MacOS, latest coremltools 8.3, python 3.12.8, latest onnxruntime 1.21.1). I'm trying to run on onnx CoreML provider. It works with other dline quantized models but doesn't work with int8. I found fp16 is quite slow comparing to yolo families so thats the reason to try to run int8 to see what kind of speedup I can get (before trying to convert to CoreML model w/o onnx):

>>ONNX model: /Users/patryk/Developer/workspace/python/vscode/playground/models_dfine/nano/dfine_n_coco_int8.onnx, provider: coreml
2025-05-06 14:26:57.823029 [W:onnxruntime:, coreml_execution_provider.cc:112 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 158 number of nodes in the graph: 1871 number of nodes supported by CoreML: 987
Error benchmarking /Users/patryk/Developer/workspace/python/vscode/playground/models_dfine/nano/dfine_n_coco_int8.onnx: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name '/model/backbone/model/embedder/stem1/convolution/Conv_quant'

Below some result from some improvised benchmarking script

>>>> STARTING BENCHMARK for:  /Users/patryk/Developer/workspace/python/vscode/playground/models_dfine/nano with provider:  coreml and model types:  onnx
Benchmarking Progress:   0%|                                           | 0/6 [00:00<?, ?model/s]>>ONNX model: /Users/patryk/Developer/workspace/python/vscode/playground/models_dfine/nano/dfine_n_coco_uint8.onnx, provider: coreml
2025-05-06 14:26:33.696688 [W:onnxruntime:, coreml_execution_provider.cc:112 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 158 number of nodes in the graph: 1871 number of nodes supported by CoreML: 987
Original Input shape: ['batch_size', 3, 'height', 'width']
Info: Input shape contains symbolic dimensions. Replacing with concrete values.
Info: Replacing symbolic dimension 'batch_size' with 1
Info: Replacing symbolic dimension 'height' with 640
Info: Replacing symbolic dimension 'width' with 640
Using Concrete Input shape for benchmark: [1, 3, 640, 640]
>>   Inference [ms]: avg: 129.27, min: 126.38, max: 159.20,
>>     Inputs:  Name: pixel_values, Shape: ['batch_size', 3, 'height', 'width'], Type: tensor(float)
>>     Outputs: Name: logits, Shape: ['batch_size', 300, 80], Type: tensor(float)
Benchmarking Progress:  17%|███               | 1/6 [00:10<00:50, 10.15s/model, provider=coreml]>>ONNX (FP16) model: /Users/patryk/Developer/workspace/python/vscode/playground/models_dfine/nano/dfine_n_coco_fp16.onnx, provider: coreml
2025-05-06 14:26:43.821048 [W:onnxruntime:, coreml_execution_provider.cc:112 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 3 number of nodes in the graph: 1221 number of nodes supported by CoreML: 5
Original Input shape: ['batch_size', 3, 'height', 'width']
Info: Input shape contains symbolic dimensions. Replacing with concrete values.
Info: Replacing symbolic dimension 'batch_size' with 1
Info: Replacing symbolic dimension 'height' with 640
Info: Replacing symbolic dimension 'width' with 640
Using Concrete Input shape for benchmark: [1, 3, 640, 640]
>>   Inference [ms]: avg: 57.32, min: 54.13, max: 60.38,
>>     Inputs:  Name: pixel_values, Shape: ['batch_size', 3, 'height', 'width'], Type: tensor(float)
>>     Outputs: Name: logits, Shape: ['batch_size', 300, 80], Type: tensor(float)
Benchmarking Progress:  33%|██████            | 2/6 [00:13<00:24,  6.19s/model, provider=coreml]>>ONNX model: /Users/patryk/Developer/workspace/python/vscode/playground/models_dfine/nano/dfine_n_coco_model_quantized.onnx, provider: coreml
2025-05-06 14:26:47.263249 [W:onnxruntime:, coreml_execution_provider.cc:112 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 158 number of nodes in the graph: 1871 number of nodes supported by CoreML: 987
Original Input shape: ['batch_size', 3, 'height', 'width']
Info: Input shape contains symbolic dimensions. Replacing with concrete values.
Info: Replacing symbolic dimension 'batch_size' with 1
Info: Replacing symbolic dimension 'height' with 640
Info: Replacing symbolic dimension 'width' with 640
Using Concrete Input shape for benchmark: [1, 3, 640, 640]
>>   Inference [ms]: avg: 135.41, min: 127.90, max: 144.69,
>>     Inputs:  Name: pixel_values, Shape: ['batch_size', 3, 'height', 'width'], Type: tensor(float)
>>     Outputs: Name: logits, Shape: ['batch_size', 300, 80], Type: tensor(float)
Benchmarking Progress:  50%|█████████         | 3/6 [00:24<00:24,  8.18s/model, provider=coreml]>>ONNX model: /Users/patryk/Developer/workspace/python/vscode/playground/models_dfine/nano/dfine_n_coco_int8.onnx, provider: coreml
2025-05-06 14:26:57.823029 [W:onnxruntime:, coreml_execution_provider.cc:112 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 158 number of nodes in the graph: 1871 number of nodes supported by CoreML: 987
Error benchmarking /Users/patryk/Developer/workspace/python/vscode/playground/models_dfine/nano/dfine_n_coco_int8.onnx: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name '/model/backbone/model/embedder/stem1/convolution/Conv_quant'

Xenova

ONNX Community org May 6

Hi there 👋

The reason is that the int8 quantization is specifically exported with INT8 Conv weights (instead of UINT8). which may not have an implementation in some runtimes (as above). To use the uint8 quantization, you can use the model_uint8.onnx or model_quantized.onnx file.

You can learn more about the error here: https://github.com/microsoft/onnxruntime/issues/15888

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment