InternVL3_5-1B_GPTQ_INT4

This version of InternVL3_5-1B_GPTQ_INT4 has been converted to run on the Axera NPU using w4a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 5.1-patch1.

Please note that the context of the model is 2k and the maximum prefill length is 1k.

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo:

https://huggingface.co/OpenGVLab/InternVL3_5-1B

How to Convert LLM from Huggingface to axmodel

AXera NPU HOST LLM Runtime

AXera NPU AXCL LLM Runtime

Support Platform

Chips image encoder 448 ttft w8a16
AX650 364.412 ms 883.458 ms 28.09 tokens/sec

How to use

Download all files from this repository to the device

$ tree -L 1
.
├── assets
├── config.json
├── examples
├── gradio_demo.py
├── infer_axmodel.py
├── infer_torch.py
├── internvl3-5_axmodel
├── internvl3-5_tokenizer
├── README.md
├── utils
└── vit-models

6 directories, 5 files

Install transformer

pip install transformers==4.57.1

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board

Interactive conversations using the C++ Demo:

./run_internvl_3-5_1b_448_ax650.sh

The log information is as follows:

root@ax650 ~/yongqiang/push_hugging_face/InternVL3_5-1B_GPTQ_INT4 # ./run_internvl_3-5_1b_448_ax650.sh
[I][                            Init][ 135]: LLM init start
[I][                            Init][ 137]: Total CMM:7915 MB
tokenizer_type = 3
  3% | ██                                |   1 /  31 [0.71s<21.92s, 1.41 count/s] tokenizer init ok[I][                            Init][  26]: LLaMaEmbedSelector use mmap
  6% | ███                               |   2 /  31 [0.71s<11.05s, 2.81 count/s] embed_selector init ok[I][                            Init][ 182]: attr.axmodel_num:28
100% | ████████████████████████████████ |  31 /  31 [2.06s<2.06s, 15.03 count/s] init post axmodel ok,remain_cmm(6940 MB)[I][                            Init][ 240]: image encoder feature outputs:0
103% | ██████████████████████████████████ |  32 /  31 [2.32s<2.25s, 13.79 count/s] init vpm axmodel ok,remain_cmm(6588 MB)[I][                            Init][ 280]: image encoder input nhwc@uint8
[I][                            Init][ 305]: image encoder output float32

[I][                            Init][ 335]: max_token_len : 2047
[I][                            Init][ 340]: kv_cache_size : 1024, kv_cache_num: 2047
[I][                            Init][ 348]: prefill_token_num : 128
[I][                            Init][ 352]: grp: 1, prefill_max_token_num : 1
[I][                            Init][ 352]: grp: 2, prefill_max_token_num : 128
[I][                            Init][ 352]: grp: 3, prefill_max_token_num : 256
[I][                            Init][ 352]: grp: 4, prefill_max_token_num : 384
[I][                            Init][ 352]: grp: 5, prefill_max_token_num : 512
[I][                            Init][ 352]: grp: 6, prefill_max_token_num : 640
[I][                            Init][ 352]: grp: 7, prefill_max_token_num : 768
[I][                            Init][ 352]: grp: 8, prefill_max_token_num : 896
[I][                            Init][ 352]: grp: 9, prefill_max_token_num : 1024
[I][                            Init][ 356]: prefill_max_token_num : 1024
[I][                     load_config][ 281]: load config:
{
    "enable_repetition_penalty": true,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 30,
    "repetition_penalty": 1.2,
    "temperature": 0.7,
    "top_k": 10,
    "top_p": 0.9
}

[I][                            Init][ 373]: LLM init ok
[I][                            Init][ 375]: Left CMM:6588 MB
Type "q" to exit, Ctrl+c to stop current running
prompt(输入q退出) >> 介绍一下你自己
image(回车键跳过) >>
[I][                             Run][ 713]: input token num : 21, prefill_split_num : 1
[I][                             Run][ 747]: input_num_token:21
[I][                             Run][ 976]: ttft: 83.79 ms
我被称为"语言模型-1.0",来自上海人工智能实验室。我的开发团队致力于为用户提供高效、准确和个性化的AI服务。作为一款先进的自然语言处理(NLP)模型,我旨在帮助用户解决各种语言相关问题,并提供有用的信息和建议。我的设计目标是能够以自然流畅的方式与人类进行交互,无论是回答问题、提供建议还是执行任务。

[N][                             Run][1102]: hit eos,avg 19.79 token/s

prompt(输入q退出) >> 请你详细描述下面这幅图
image(回车键跳过) >> assets/image_1.jpg
[I][                     EncodeImage][ 481]: image encode time : 408.467987 ms, size : 1
[I][                          Encode][ 636]: input_ids size:284
[I][                          Encode][ 644]: offset 15
[I][                          Encode][ 673]: img_embed.size:1, 262144
[I][                          Encode][ 689]: out_embed size:290816
[I][                          Encode][ 690]: input_ids size 284
[I][                          Encode][ 692]: position_ids size:284
[I][                             Run][ 713]: input token num : 284, prefill_split_num : 3
[I][                             Run][ 747]: input_num_token:128
[I][                             Run][ 747]: input_num_token:128
[I][                             Run][ 747]: input_num_token:28
[I][                             Run][ 976]: ttft: 270.76 ms
这是一幅生动的图片,展示了一只大熊猫正在自然环境中觅食的情景。画面中,大熊猫正低头在植物丛中寻找食物。它的毛发呈白色,背部和腹部有黑色斑点。周围绿意盎然,各种灌木和植物环绕着它,显得生机勃勃。背景的木质结构可能是一把竹竿或长椅,进一步暗示这可能是动物园或野生动物保护区。整个场景充满了自然的气息,让人感受到大自然的可爱与生机。

[N][                             Run][1102]: hit eos,avg 19.86 token/s

prompt(输入q退出) >>

Interactive conversations using the Gradio API:

$ python3 gradio_demo.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --vit_model vit-models/internvl_vit_model_1x3x448x448.axmodel

Plain text dialogue:

demo_1

Image understanding:

demo_2


Run the following command on the Axera board to start a chat conversation:

$ python3 infer_axmodel.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --question "请计算函数[y=2x^2+2]的导数, 并提供 markdown 格式的推理过程"

output:

[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-dirty 0fdbfe15-dirty
Model loaded successfully!
slice_indices: [0]
Slice prefill done: 0
answer >> 函数 \( y = 2x^2 + 2 \) 的导数可以通过求导法则来计算。首先,我们对函数中的每一项分别求导:

1. 对于 \( 2x^2 \),使用幂法则求导:
   \[
   \frac{d}{dx}(2x^2) = 2 \cdot 2x = 4x
   \]

2. 对于常数项 \( 2 \),其导数为 0,因为常数的导数为 0。

将这两部分的结果相加,得到函数 \( y \) 的导数:
\[
y' = 4x
\]

因此,函数 \( y = 2x^2 + 2 \) 的导数为 \( y' = 4x \)。

Enter the following command to perform the single-image understanding task:

$ python3 infer_axmodel.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --question "请描述这幅图" -i examples/image_0.jpg --vit_model vit-models/internvl_vit_model_1x3x448x448.axmodel

image_0.jpg

output:

[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-dirty 0fdbfe15-dirty
Model loaded successfully!
slice_indices: [0, 1, 2]
Slice prefill done: 0
Slice prefill done: 1
Slice prefill done: 2
answer >> 这是一张红熊猫的照片。红熊猫是一种红棕色的哺乳动物,通常生活在亚洲的森林中。它们以捕食昆虫和小型无脊椎动物为生。图片中,红熊猫正坐在一个木制的平台上,背景是绿色的树木和植被,显得非常自然和生动。红熊猫的表情看起来很友好,似乎在观察或等待什么。
Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXERA-TECH/InternVL3_5-1B_GPTQ_INT4