InternVL3.5-8B for RK3588 NPU

This repository provides a hardware-accelerated port of InternVL3.5-8B optimized for Rockchip RK3588 NPU.

User:<image>Describe the image.

Answer: The image depicts an astronaut lying on the moon's surface, holding a green bottle with 'Coke' written on it. The backdrop features Earth in the distance, creating a surreal and humorous scene.

Model Files

Component	File	Precision
LLM	`internvl3_5-8b-instruct_w8a8_rk3588.rkllm`	W8A8
Vision Encoder	`internvl3_5-8b_vision_rk3588.rknn`	FP16

Hardware Requirements

Rockchip RK3588 / RK3588S
RKNPU2 driver
Tested on:
- Rock 5C
- Ubuntu 22.04 / 24.04 (Joshua Riek)

Runtime Requirements

RKLLM runtime
RKNN runtime (rknpu2)
OpenCV (for image preprocessing)

Model performance benchmark (FPS)

All models, with C++ examples, can be found on the Q-engineering GitHub.

All LLM models are quantized to w8a8, while the VLM vision encoders use fp16.

model	RAM (GB)¹	llm cold sec²	llm warm sec³	vlm cold sec²	vlm warm sec³	Resolution	Tokens/s
Qwen3-2B	3.1	21.9	2.6	10.0	0.9	448 x 448	11.5
Qwen3-4B	8.7	49.6	5.6	10.6	1.1	448 x 448	5.7
InternVL3.5-1B	1.9	8.3	8.0	1.5	0.8	448 x 448	24
InternVL3.5-2B	3.0	22	8.0	2.7	0.8	448 x 448	11.2
InternVL3.5-4B	5.4	50	8.0	5.9	0.8	448 x 448	5
InternVL3.5-8B	8.8	92	8.0	50.5	5.8	448 x 448	3.5
Qwen2.5-3B	4.8	48.3	4.0	17.9	1.8	392 x 392	7.0
Qwen2-7B	8.7	86.6	34.5	37.1	20.7	392 x 392	3.7
Qwen2-2.2B	3.3	29.1	2.5	17.1	1.7	392 x 392	12.5
InternVL3-1B	1.3	6.8	1.1	7.8	0.75	448 x 448	30
SmolVLM2-2.2B	3.4	21.2	2.6	10.5	0.9	384 x 384	11
SmolVLM2-500M	0.8	4.8	0.7	2.5	0.25	384 x 384	31
SmolVLM2-256M	0.5	1.1	0.4	2.5	0.25	384 x 384	54

¹ The total used memory; LLM plus the VLM.
² When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.
The duration depends on your OS, I/O transfer rate, and memory mapping.
³ Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.

Plot_1
Plot_2

Example Usage

see: https://github.com/Qengineering/InternVL3.5-8B-NPU

Notes

This is not a Transformers-compatible model
This repository provides precompiled NPU binaries
CPU fallback is not supported

Downloads last month: 7

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support