InternVL3.5-8B for RK3588 NPU
This repository provides a hardware-accelerated port of InternVL3.5-8B optimized for Rockchip RK3588 NPU.
User:<image>Describe the image.
Answer: The image depicts an astronaut lying on the moon's surface, holding a green bottle with 'Coke' written on it. The backdrop features Earth in the distance, creating a surreal and humorous scene.
Model Files
| Component | File | Precision |
|---|---|---|
| LLM | internvl3_5-8b-instruct_w8a8_rk3588.rkllm |
W8A8 |
| Vision Encoder | internvl3_5-8b_vision_rk3588.rknn |
FP16 |
Hardware Requirements
- Rockchip RK3588 / RK3588S
- RKNPU2 driver
- Tested on:
- Rock 5C
- Ubuntu 22.04 / 24.04 (Joshua Riek)
Runtime Requirements
- RKLLM runtime
- RKNN runtime (rknpu2)
- OpenCV (for image preprocessing)
Model performance benchmark (FPS)
All models, with C++ examples, can be found on the Q-engineering GitHub.
All LLM models are quantized to w8a8, while the VLM vision encoders use fp16.
| model | RAM (GB)1 | llm cold sec2 | llm warm sec3 | vlm cold sec2 | vlm warm sec3 | Resolution | Tokens/s |
|---|---|---|---|---|---|---|---|
| Qwen3-2B | 3.1 | 21.9 | 2.6 | 10.0 | 0.9 | 448 x 448 | 11.5 |
| Qwen3-4B | 8.7 | 49.6 | 5.6 | 10.6 | 1.1 | 448 x 448 | 5.7 |
| InternVL3.5-1B | 1.9 | 8.3 | 8.0 | 1.5 | 0.8 | 448 x 448 | 24 |
| InternVL3.5-2B | 3.0 | 22 | 8.0 | 2.7 | 0.8 | 448 x 448 | 11.2 |
| InternVL3.5-4B | 5.4 | 50 | 8.0 | 5.9 | 0.8 | 448 x 448 | 5 |
| InternVL3.5-8B | 8.8 | 92 | 8.0 | 50.5 | 5.8 | 448 x 448 | 3.5 |
| Qwen2.5-3B | 4.8 | 48.3 | 4.0 | 17.9 | 1.8 | 392 x 392 | 7.0 |
| Qwen2-7B | 8.7 | 86.6 | 34.5 | 37.1 | 20.7 | 392 x 392 | 3.7 |
| Qwen2-2.2B | 3.3 | 29.1 | 2.5 | 17.1 | 1.7 | 392 x 392 | 12.5 |
| InternVL3-1B | 1.3 | 6.8 | 1.1 | 7.8 | 0.75 | 448 x 448 | 30 |
| SmolVLM2-2.2B | 3.4 | 21.2 | 2.6 | 10.5 | 0.9 | 384 x 384 | 11 |
| SmolVLM2-500M | 0.8 | 4.8 | 0.7 | 2.5 | 0.25 | 384 x 384 | 31 |
| SmolVLM2-256M | 0.5 | 1.1 | 0.4 | 2.5 | 0.25 | 384 x 384 | 54 |
1 The total used memory; LLM plus the VLM.
2 When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.
The duration depends on your OS, I/O transfer rate, and memory mapping.
3 Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.
Example Usage
Notes
- This is not a Transformers-compatible model
- This repository provides precompiled NPU binaries
- CPU fallback is not supported
- Downloads last month
- 7