InternVL3_5-1B_GPTQ_INT4
This version of InternVL3_5-1B_GPTQ_INT4 has been converted to run on the Axera NPU using w4a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 5.1-patch1.
Please note that the context of the model is 2k and the maximum prefill length is 1k.
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo:
https://huggingface.co/OpenGVLab/InternVL3_5-1B
How to Convert LLM from Huggingface to axmodel
Support Platform
- AX650
- AX650N DEMO Board
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card
| Chips | image encoder 448 | ttft | w8a16 |
|---|---|---|---|
| AX650 | 364.412 ms | 883.458 ms | 28.09 tokens/sec |
How to use
Download all files from this repository to the device
$ tree -L 1
.
├── assets
├── config.json
├── examples
├── gradio_demo.py
├── infer_axmodel.py
├── infer_torch.py
├── internvl3-5_axmodel
├── internvl3-5_tokenizer
├── README.md
├── utils
└── vit-models
6 directories, 5 files
Install transformer
pip install transformers==4.57.1
Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
Interactive conversations using the C++ Demo:
./run_internvl_3-5_1b_448_ax650.sh
The log information is as follows:
root@ax650 ~/yongqiang/push_hugging_face/InternVL3_5-1B_GPTQ_INT4 # ./run_internvl_3-5_1b_448_ax650.sh
[I][ Init][ 135]: LLM init start
[I][ Init][ 137]: Total CMM:7915 MB
tokenizer_type = 3
3% | ██ | 1 / 31 [0.71s<21.92s, 1.41 count/s] tokenizer init ok[I][ Init][ 26]: LLaMaEmbedSelector use mmap
6% | ███ | 2 / 31 [0.71s<11.05s, 2.81 count/s] embed_selector init ok[I][ Init][ 182]: attr.axmodel_num:28
100% | ████████████████████████████████ | 31 / 31 [2.06s<2.06s, 15.03 count/s] init post axmodel ok,remain_cmm(6940 MB)[I][ Init][ 240]: image encoder feature outputs:0
103% | ██████████████████████████████████ | 32 / 31 [2.32s<2.25s, 13.79 count/s] init vpm axmodel ok,remain_cmm(6588 MB)[I][ Init][ 280]: image encoder input nhwc@uint8
[I][ Init][ 305]: image encoder output float32
[I][ Init][ 335]: max_token_len : 2047
[I][ Init][ 340]: kv_cache_size : 1024, kv_cache_num: 2047
[I][ Init][ 348]: prefill_token_num : 128
[I][ Init][ 352]: grp: 1, prefill_max_token_num : 1
[I][ Init][ 352]: grp: 2, prefill_max_token_num : 128
[I][ Init][ 352]: grp: 3, prefill_max_token_num : 256
[I][ Init][ 352]: grp: 4, prefill_max_token_num : 384
[I][ Init][ 352]: grp: 5, prefill_max_token_num : 512
[I][ Init][ 352]: grp: 6, prefill_max_token_num : 640
[I][ Init][ 352]: grp: 7, prefill_max_token_num : 768
[I][ Init][ 352]: grp: 8, prefill_max_token_num : 896
[I][ Init][ 352]: grp: 9, prefill_max_token_num : 1024
[I][ Init][ 356]: prefill_max_token_num : 1024
[I][ load_config][ 281]: load config:
{
"enable_repetition_penalty": true,
"enable_temperature": true,
"enable_top_k_sampling": true,
"enable_top_p_sampling": false,
"penalty_window": 30,
"repetition_penalty": 1.2,
"temperature": 0.7,
"top_k": 10,
"top_p": 0.9
}
[I][ Init][ 373]: LLM init ok
[I][ Init][ 375]: Left CMM:6588 MB
Type "q" to exit, Ctrl+c to stop current running
prompt(输入q退出) >> 介绍一下你自己
image(回车键跳过) >>
[I][ Run][ 713]: input token num : 21, prefill_split_num : 1
[I][ Run][ 747]: input_num_token:21
[I][ Run][ 976]: ttft: 83.79 ms
我被称为"语言模型-1.0",来自上海人工智能实验室。我的开发团队致力于为用户提供高效、准确和个性化的AI服务。作为一款先进的自然语言处理(NLP)模型,我旨在帮助用户解决各种语言相关问题,并提供有用的信息和建议。我的设计目标是能够以自然流畅的方式与人类进行交互,无论是回答问题、提供建议还是执行任务。
[N][ Run][1102]: hit eos,avg 19.79 token/s
prompt(输入q退出) >> 请你详细描述下面这幅图
image(回车键跳过) >> assets/image_1.jpg
[I][ EncodeImage][ 481]: image encode time : 408.467987 ms, size : 1
[I][ Encode][ 636]: input_ids size:284
[I][ Encode][ 644]: offset 15
[I][ Encode][ 673]: img_embed.size:1, 262144
[I][ Encode][ 689]: out_embed size:290816
[I][ Encode][ 690]: input_ids size 284
[I][ Encode][ 692]: position_ids size:284
[I][ Run][ 713]: input token num : 284, prefill_split_num : 3
[I][ Run][ 747]: input_num_token:128
[I][ Run][ 747]: input_num_token:128
[I][ Run][ 747]: input_num_token:28
[I][ Run][ 976]: ttft: 270.76 ms
这是一幅生动的图片,展示了一只大熊猫正在自然环境中觅食的情景。画面中,大熊猫正低头在植物丛中寻找食物。它的毛发呈白色,背部和腹部有黑色斑点。周围绿意盎然,各种灌木和植物环绕着它,显得生机勃勃。背景的木质结构可能是一把竹竿或长椅,进一步暗示这可能是动物园或野生动物保护区。整个场景充满了自然的气息,让人感受到大自然的可爱与生机。
[N][ Run][1102]: hit eos,avg 19.86 token/s
prompt(输入q退出) >>
Interactive conversations using the Gradio API:
$ python3 gradio_demo.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --vit_model vit-models/internvl_vit_model_1x3x448x448.axmodel
Plain text dialogue:
Image understanding:
Run the following command on the Axera board to start a chat conversation:
$ python3 infer_axmodel.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --question "请计算函数[y=2x^2+2]的导数, 并提供 markdown 格式的推理过程"
output:
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-dirty 0fdbfe15-dirty
Model loaded successfully!
slice_indices: [0]
Slice prefill done: 0
answer >> 函数 \( y = 2x^2 + 2 \) 的导数可以通过求导法则来计算。首先,我们对函数中的每一项分别求导:
1. 对于 \( 2x^2 \),使用幂法则求导:
\[
\frac{d}{dx}(2x^2) = 2 \cdot 2x = 4x
\]
2. 对于常数项 \( 2 \),其导数为 0,因为常数的导数为 0。
将这两部分的结果相加,得到函数 \( y \) 的导数:
\[
y' = 4x
\]
因此,函数 \( y = 2x^2 + 2 \) 的导数为 \( y' = 4x \)。
Enter the following command to perform the single-image understanding task:
$ python3 infer_axmodel.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --question "请描述这幅图" -i examples/image_0.jpg --vit_model vit-models/internvl_vit_model_1x3x448x448.axmodel
output:
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-dirty 0fdbfe15-dirty
Model loaded successfully!
slice_indices: [0, 1, 2]
Slice prefill done: 0
Slice prefill done: 1
Slice prefill done: 2
answer >> 这是一张红熊猫的照片。红熊猫是一种红棕色的哺乳动物,通常生活在亚洲的森林中。它们以捕食昆虫和小型无脊椎动物为生。图片中,红熊猫正坐在一个木制的平台上,背景是绿色的树木和植被,显得非常自然和生动。红熊猫的表情看起来很友好,似乎在观察或等待什么。
- Downloads last month
- 18
Model tree for AXERA-TECH/InternVL3_5-1B_GPTQ_INT4
Base model
OpenGVLab/InternVL3_5-1B-Pretrained

