InternVL3.5-8B for RK3588 NPU

This repository provides a hardware-accelerated port of InternVL3.5-8B optimized for Rockchip RK3588 NPU.

Alt text
User:<image>Describe the image.

Answer: The image depicts an astronaut lying on the moon's surface, holding a green bottle with 'Coke' written on it. The backdrop features Earth in the distance, creating a surreal and humorous scene.


Model Files

Component File Precision
LLM internvl3_5-8b-instruct_w8a8_rk3588.rkllm W8A8
Vision Encoder internvl3_5-8b_vision_rk3588.rknn FP16

Hardware Requirements

  • Rockchip RK3588 / RK3588S
  • RKNPU2 driver
  • Tested on:
    • Rock 5C
    • Ubuntu 22.04 / 24.04 (Joshua Riek)

Runtime Requirements

  • RKLLM runtime
  • RKNN runtime (rknpu2)
  • OpenCV (for image preprocessing)

Model performance benchmark (FPS)

All models, with C++ examples, can be found on the Q-engineering GitHub.

All LLM models are quantized to w8a8, while the VLM vision encoders use fp16.

model RAM (GB)1 llm cold sec2 llm warm sec3 vlm cold sec2 vlm warm sec3 Resolution Tokens/s
Qwen3-2B 3.1 21.9 2.6 10.0 0.9 448 x 448 11.5
Qwen3-4B 8.7 49.6 5.6 10.6 1.1 448 x 448 5.7
InternVL3.5-1B 1.9 8.3 8.0 1.5 0.8 448 x 448 24
InternVL3.5-2B 3.0 22 8.0 2.7 0.8 448 x 448 11.2
InternVL3.5-4B 5.4 50 8.0 5.9 0.8 448 x 448 5
InternVL3.5-8B 8.8 92 8.0 50.5 5.8 448 x 448 3.5
Qwen2.5-3B 4.8 48.3 4.0 17.9 1.8 392 x 392 7.0
Qwen2-7B 8.7 86.6 34.5 37.1 20.7 392 x 392 3.7
Qwen2-2.2B 3.3 29.1 2.5 17.1 1.7 392 x 392 12.5
InternVL3-1B 1.3 6.8 1.1 7.8 0.75 448 x 448 30
SmolVLM2-2.2B 3.4 21.2 2.6 10.5 0.9 384 x 384 11
SmolVLM2-500M 0.8 4.8 0.7 2.5 0.25 384 x 384 31
SmolVLM2-256M 0.5 1.1 0.4 2.5 0.25 384 x 384 54

1 The total used memory; LLM plus the VLM.
2 When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.
The duration depends on your OS, I/O transfer rate, and memory mapping.
3 Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.

Plot_1
Plot_2

Example Usage

Notes

  • This is not a Transformers-compatible model
  • This repository provides precompiled NPU binaries
  • CPU fallback is not supported
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support