InternVL3_5-1B_GPTQ_INT4

This version of InternVL3_5-1B_GPTQ_INT4 has been converted to run on the Axera NPU using w4a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 5.1-patch1.

Please note that the context of the model is 2k and the maximum prefill length is 1k.

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo:

https://huggingface.co/OpenGVLab/InternVL3_5-1B

How to Convert LLM from Huggingface to axmodel

AXera NPU HOST LLM Runtime

AXera NPU AXCL LLM Runtime

Support Platform

AX650
- AX650N DEMO Board
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card

Chips	image encoder 448	ttft	w8a16
AX650	364.412 ms	883.458 ms	28.09 tokens/sec

How to use

Download all files from this repository to the device

$ tree -L 1
.
├── assets
├── config.json
├── examples
├── gradio_demo.py
├── infer_axmodel.py
├── infer_torch.py
├── internvl3-5_axmodel
├── internvl3-5_tokenizer
├── README.md
├── utils
└── vit-models

6 directories, 5 files

Install transformer

pip install transformers==4.57.1

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board

Interactive conversations using the C++ Demo:

./run_internvl_3-5_1b_448_ax650.sh

The log information is as follows:

root@ax650 ~/yongqiang/push_hugging_face/InternVL3_5-1B_GPTQ_INT4 # ./run_internvl_3-5_1b_448_ax650.sh
[I][                            Init][ 135]: LLM init start
[I][                            Init][ 137]: Total CMM:7915 MB
tokenizer_type = 3
  3% | ██                                |   1 /  31 [0.71s<21.92s, 1.41 count/s] tokenizer init ok[I][                            Init][  26]: LLaMaEmbedSelector use mmap
  6% | ███                               |   2 /  31 [0.71s<11.05s, 2.81 count/s] embed_selector init ok[I][                            Init][ 182]: attr.axmodel_num:28
100% | ████████████████████████████████ |  31 /  31 [2.06s<2.06s, 15.03 count/s] init post axmodel ok,remain_cmm(6940 MB)[I][                            Init][ 240]: image encoder feature outputs:0
103% | ██████████████████████████████████ |  32 /  31 [2.32s<2.25s, 13.79 count/s] init vpm axmodel ok,remain_cmm(6588 MB)[I][                            Init][ 280]: image encoder input nhwc@uint8
[I][                            Init][ 305]: image encoder output float32

[I][                            Init][ 335]: max_token_len : 2047
[I][                            Init][ 340]: kv_cache_size : 1024, kv_cache_num: 2047
[I][                            Init][ 348]: prefill_token_num : 128
[I][                            Init][ 352]: grp: 1, prefill_max_token_num : 1
[I][                            Init][ 352]: grp: 2, prefill_max_token_num : 128
[I][                            Init][ 352]: grp: 3, prefill_max_token_num : 256
[I][                            Init][ 352]: grp: 4, prefill_max_token_num : 384
[I][                            Init][ 352]: grp: 5, prefill_max_token_num : 512
[I][                            Init][ 352]: grp: 6, prefill_max_token_num : 640
[I][                            Init][ 352]: grp: 7, prefill_max_token_num : 768
[I][                            Init][ 352]: grp: 8, prefill_max_token_num : 896
[I][                            Init][ 352]: grp: 9, prefill_max_token_num : 1024
[I][                            Init][ 356]: prefill_max_token_num : 1024
[I][                     load_config][ 281]: load config:
{
    "enable_repetition_penalty": true,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 30,
    "repetition_penalty": 1.2,
    "temperature": 0.7,
    "top_k": 10,
    "top_p": 0.9
}

[I][                            Init][ 373]: LLM init ok
[I][                            Init][ 375]: Left CMM:6588 MB
Type "q" to exit, Ctrl+c to stop current running
prompt(输入q退出) >> 介绍一下你自己
image(回车键跳过) >>
[I][                             Run][ 713]: input token num : 21, prefill_split_num : 1
[I][                             Run][ 747]: input_num_token:21
[I][                             Run][ 976]: ttft: 83.79 ms
我被称为"语言模型-1.0"，来自上海人工智能实验室。我的开发团队致力于为用户提供高效、准确和个性化的AI服务。作为一款先进的自然语言处理（NLP）模型，我旨在帮助用户解决各种语言相关问题，并提供有用的信息和建议。我的设计目标是能够以自然流畅的方式与人类进行交互，无论是回答问题、提供建议还是执行任务。

[N][                             Run][1102]: hit eos,avg 19.79 token/s

prompt(输入q退出) >> 请你详细描述下面这幅图
image(回车键跳过) >> assets/image_1.jpg
[I][                     EncodeImage][ 481]: image encode time : 408.467987 ms, size : 1
[I][                          Encode][ 636]: input_ids size:284
[I][                          Encode][ 644]: offset 15
[I][                          Encode][ 673]: img_embed.size:1, 262144
[I][                          Encode][ 689]: out_embed size:290816
[I][                          Encode][ 690]: input_ids size 284
[I][                          Encode][ 692]: position_ids size:284
[I][                             Run][ 713]: input token num : 284, prefill_split_num : 3
[I][                             Run][ 747]: input_num_token:128
[I][                             Run][ 747]: input_num_token:128
[I][                             Run][ 747]: input_num_token:28
[I][                             Run][ 976]: ttft: 270.76 ms
这是一幅生动的图片，展示了一只大熊猫正在自然环境中觅食的情景。画面中，大熊猫正低头在植物丛中寻找食物。它的毛发呈白色，背部和腹部有黑色斑点。周围绿意盎然，各种灌木和植物环绕着它，显得生机勃勃。背景的木质结构可能是一把竹竿或长椅，进一步暗示这可能是动物园或野生动物保护区。整个场景充满了自然的气息，让人感受到大自然的可爱与生机。

[N][                             Run][1102]: hit eos,avg 19.86 token/s

prompt(输入q退出) >>

Interactive conversations using the Gradio API:

$ python3 gradio_demo.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --vit_model vit-models/internvl_vit_model_1x3x448x448.axmodel

Plain text dialogue:

Image understanding:

Run the following command on the Axera board to start a chat conversation:

$ python3 infer_axmodel.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --question "请计算函数[y=2x^2+2]的导数, 并提供 markdown 格式的推理过程"

output:

[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-dirty 0fdbfe15-dirty
Model loaded successfully!
slice_indices: [0]
Slice prefill done: 0
answer >> 函数 \( y = 2x^2 + 2 \) 的导数可以通过求导法则来计算。首先，我们对函数中的每一项分别求导：

1. 对于 \( 2x^2 \)，使用幂法则求导：
   \[
   \frac{d}{dx}(2x^2) = 2 \cdot 2x = 4x
   \]

2. 对于常数项 \( 2 \)，其导数为 0，因为常数的导数为 0。

将这两部分的结果相加，得到函数 \( y \) 的导数：
\[
y' = 4x
\]

因此，函数 \( y = 2x^2 + 2 \) 的导数为 \( y' = 4x \)。

Enter the following command to perform the single-image understanding task:

$ python3 infer_axmodel.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --question "请描述这幅图" -i examples/image_0.jpg --vit_model vit-models/internvl_vit_model_1x3x448x448.axmodel

output:

[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-dirty 0fdbfe15-dirty
Model loaded successfully!
slice_indices: [0, 1, 2]
Slice prefill done: 0
Slice prefill done: 1
Slice prefill done: 2
answer >> 这是一张红熊猫的照片。红熊猫是一种红棕色的哺乳动物，通常生活在亚洲的森林中。它们以捕食昆虫和小型无脊椎动物为生。图片中，红熊猫正坐在一个木制的平台上，背景是绿色的树木和植被，显得非常自然和生动。红熊猫的表情看起来很友好，似乎在观察或等待什么。

Downloads last month: 21

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXERA-TECH/InternVL3_5-1B_GPTQ_INT4

Base model

OpenGVLab/InternVL3_5-1B-Pretrained

Finetuned

OpenGVLab/InternVL3_5-1B-Instruct

Finetuned

OpenGVLab/InternVL3_5-1B-MPO

Finetuned

OpenGVLab/InternVL3_5-1B

Finetuned

(3)

this model