Request for ONNX version
I have tried to convert this model to ONNX using the current Hugging Face ONNX conversion workflow (via transformers.js/scripts/convert.py and optimum.exporters.onnx). Unfortunately, the conversion fails because this model uses the idefics3 architecture, which is currently not supported by optimum.exporters.onnx.
Could the maintainers or the model authors provide an official ONNX version? Having an ONNX version would greatly help with deployment in environments that require ONNX Runtime and improve inference efficiency.
It would be nice yes, I second that idea. We could run the model in a browser environment for example.
I have some initial work on this up in: https://github.com/gabe-l-hart/optimum-onnx. This is entirely done using our internal developer assistant (see the commit message for the full prompt). I haven't done anything to validate either the code changes or the output model, but the auto-evaluated maximum delta is fairly low, so I think it should be somewhat close. I'm not at all familiar with how VLMs are handled in ONNX runtime environments, in particular the preprocessing portions. For llama.cpp, there are some significant preprocessing changes that are needed before the model will perform well, so it very well may be the case that the same is needed here.
Hi, I have generated an ONNX format of this model here: https://huggingface.co/lamco-development/granite-docling-258M-onnx using @gabegoodhart 's provided work. Check it out!
Posting here for others to take a look at: https://huggingface.co/onnx-community/granite-docling-258M-ONNX
:)
Posting here for others to take a look at: https://huggingface.co/onnx-community/granite-docling-258M-ONNX
:)
Shalom Joshua ππ½, thanks for the onnx version π