YummyYum commited on
Commit
e00c0df
·
verified ·
1 Parent(s): 247c6f3

Upload folder using huggingface_hub

Browse files
.msc ADDED
Binary file (2.03 kB). View file
 
.mv ADDED
@@ -0,0 +1 @@
 
 
1
+ Revision:master,CreatedAt:1758550852
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Introduction
2
+
3
+ **FlagOS** is a unified heterogeneous computing software stack for large models, co-developed with leading global chip manufacturers. With core technologies such as the **FlagScale** distributed training/inference framework, **FlagGems** universal operator library, **FlagCX** communication library, and **FlagTree** unified compiler, the **FlagRelease** platform leverages the FlagOS stack to automatically produce and release various combinations of <chip + open-source model>. This enables efficient and automated model migration across diverse chips, opening a new chapter for large model deployment and application.
4
+
5
+ Based on this, the **Qwen3-Omni-30B-A3B-Instruct-FlagOS** model is adapted for the Nvidia chip using the FlagOS software stack, enabling:
6
+
7
+ ### Integrated Deployment
8
+
9
+ - Deep integration with the open-source [FlagScale framework](https://github.com/FlagOpen/FlagScale)
10
+ - Out-of-the-box inference scripts with pre-configured hardware and software parameters
11
+ - Released **FlagOS** container image supporting deployment within minutes
12
+
13
+ ### Consistency Validation
14
+
15
+ - Rigorously evaluated through benchmark testing: Performance and results from the FlagOS software stack are compared against native stacks on multiple public.
16
+
17
+ # Technical Overview
18
+
19
+ ## **FlagScale Distributed Training and Inference Framework**
20
+
21
+ FlagScale is an end-to-end framework for large models across heterogeneous computing resources, maximizing computational efficiency and ensuring model validity through core technologies. Its key advantages include:
22
+
23
+ - **Unified Deployment Interface:** Standardized command-line tools support one-click service deployment across multiple hardware platforms, significantly reducing adaptation costs in heterogeneous environments.
24
+ - **Intelligent Parallel Optimization:** Automatically generates optimal distributed parallel strategies based on chip computing characteristics, achieving dynamic load balancing of computation/communication resources.
25
+ - **Seamless Operator Switching:** Deep integration with the FlagGems operator library allows high-performance operators to be invoked via environment variables without modifying model code.
26
+
27
+ ## **FlagGems Universal Large-Model Operator Library**
28
+
29
+ FlagGems is a Triton-based, cross-architecture operator library collaboratively developed with industry partners. Its core strengths include:
30
+
31
+ - **Full-stack Coverage**: Over 100 operators, with a broader range of operator types than competing libraries.
32
+ - **Ecosystem Compatibility**: Supports 7 accelerator backends. Ongoing optimizations have significantly improved performance.
33
+ - **High Efficiency**: Employs unique code generation and runtime optimization techniques for faster secondary development and better runtime performance compared to alternatives.
34
+
35
+ ## **FlagEval Evaluation Framework**
36
+
37
+ FlagEval (Libra)** is a comprehensive evaluation system and open platform for large models launched in 2023. It aims to establish scientific, fair, and open benchmarks, methodologies, and tools to help researchers assess model and training algorithm performance. It features:
38
+ - **Multi-dimensional Evaluation**: Supports 800+ model evaluations across NLP, CV, Audio, and Multimodal fields, covering 20+ downstream tasks including language understanding and image-text generation.
39
+ - **Industry-Grade Use Cases**: Has completed horizontal evaluations of mainstream large models, providing authoritative benchmarks for chip-model performance validation.
40
+
41
+ # Evaluation Results
42
+
43
+ ## Benchmark Result
44
+
45
+ | Metrics | Qwen3-Omni-30B-A3B-Instruct-H100-CUDA | Qwen3-Omni-30B-A3B-Instruct-FlagOS |
46
+ |-------------------|--------------------------|-----------------------------|
47
+ | CMMMU | 50.440 | 50.890 |
48
+ | MMMU | 60.780 | 59.330 |
49
+ | MMMU_Pro_standard | 44.650 | 43.610 |
50
+ | MMMU_Pro_vision | 39.770 | 38.730 |
51
+ | MM-Vet v2 | 59.225 | 61.919 |
52
+ | OCRBench | 78.700 | 78.500 |
53
+ | MathVision | 57.530 | 57.570 |
54
+ | CII-Bench | 60.920 | 60.260 |
55
+ | Blink | 57.710 | 57.440 |
56
+ | AIME_0fewshot_@avg1 | 76.667 | 83.333 |
57
+ | GPQA_0fewshot_@avg1 | 60.403 | 61.829 |
58
+ | LiveBench-0fewshot_@avg1 | 68.926 | 69.598 |
59
+ | MMLU_5fewshot_@avg1 | 78.383 | 78.449 |
60
+ | MUSR_0fewshot_@avg | 59.392 | 60.185 |
61
+
62
+ # User Guide
63
+
64
+ **Environment Setup**
65
+
66
+ | Item | Version |
67
+ | ------------- | ------------------------------------------------------------ |
68
+ | Docker Version | Docker version 28.1.0, build 4d8c241 |
69
+ | Operating System | Ubuntu 22.04.5 LTS |
70
+ | FlagScale | Version: 0.8.0 |
71
+ | FlagGems | Version: 3.0 |
72
+
73
+ ## Operation Steps
74
+
75
+ ### Download Open-source Model Weights
76
+
77
+ ```bash
78
+ pip install modelscope
79
+ modelscope download --model Qwen/Qwen3-Omni-30B-A3B-Instruct --local_dir /share/Qwen3-Omni-30B-A3B-Instruct
80
+
81
+ ```
82
+
83
+ ### Download FlagOS Image
84
+
85
+ ```bash
86
+ docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease_nvidia_qwen3o
87
+ ```
88
+
89
+ ### Start the inference service
90
+
91
+ ```bash
92
+ #Container Startup
93
+ docker run --rm --init --detach --net=host --uts=host --ipc=host --security-opt=seccomp=unconfined --privileged=true --ulimit stack=67108864 --ulimit memlock=-1 --ulimit nofile=1048576:1048576 --shm-size=32G -v /share:/share --gpus all --name flagos harbor.baai.ac.cn/flagrelease-public/flagrelease_nvidia_qwen3o sleep infinity
94
+ ```
95
+
96
+ ### Serve
97
+
98
+ ```bash
99
+ flagscale serve qwen3_o
100
+
101
+ ```
102
+
103
+
104
+ ## Service Invocation
105
+
106
+ ### API-based Invocation Script
107
+
108
+ ```bash
109
+ import openai
110
+ openai.api_key = "EMPTY"
111
+ openai.base_url = "http://<server_ip>:9010/v1/"
112
+ model = "Qwen3-Omni-30B-A3B-Instruct-nvidia-flagos"
113
+ messages = [
114
+ {"role": "system", "content": "You are a helpful assistant."},
115
+ {"role": "user", "content": "What's the weather like today?"}
116
+ ]
117
+ response = openai.chat.completions.create(
118
+ model=model,
119
+ messages=messages,
120
+ stream=False,
121
+ )
122
+ for item in response:
123
+ print(item)
124
+
125
+ ```
126
+
127
+ ### AnythingLLM Integration Guide
128
+
129
+ #### 1. Download & Install
130
+
131
+ - Visit the official site: https://anythingllm.com/
132
+ - Choose the appropriate version for your OS (Windows/macOS/Linux)
133
+ - Follow the installation wizard to complete the setup
134
+
135
+ #### 2. Configuration
136
+
137
+ - Launch AnythingLLM
138
+ - Open settings (bottom left, fourth tab)
139
+ - Configure core LLM parameters
140
+ - Click "Save Settings" to apply changes
141
+
142
+ #### 3. Model Interaction
143
+
144
+ - After model loading is complete:
145
+ - Click **"New Conversation"**
146
+ - Enter your question (e.g., “Explain the basics of quantum computing”)
147
+ - Click the send button to get a response
148
+
149
+ # Contributing
150
+
151
+ We warmly welcome global developers to join us:
152
+
153
+ 1. Submit Issues to report problems
154
+ 2. Create Pull Requests to contribute code
155
+ 3. Improve technical documentation
156
+ 4. Expand hardware adaptation support
157
+
158
+
159
+ # License
160
+
161
+ 本模型的权重来源于Qwen/Qwen3-Omni-30B-A3B-Instruct,以apache2.0协议https://www.apache.org/licenses/LICENSE-2.0.txt开源。
162
+
chat_template.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {%- if messages[0].content is string %}\n {{- messages[0].content }}\n {%- else %}\n {%- for content in messages[0].content %}\n {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}\n {{- \"<|vision_start|><|image_pad|><|vision_end|>\" }}\n {%- elif content.type == 'audio' or 'audio' in content or 'audio_url' in content %}\n {{- \"<|audio_start|><|audio_pad|><|audio_end|>\" }}\n {%- elif content.type == 'video' or 'video' in content %}\n {{- \"<|vision_start|><|video_pad|><|vision_end|>\" }}\n {%- elif content.type == 'text' %}\n {{- content.text }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {%- endif %}\n {{- '\\n\\n' }}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {%- if messages[0].content is string %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- else %}\n {%- for content in messages[0].content %}\n {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}\n {{- '<|im_start|>system\\n' +\"<|vision_start|><|image_pad|><|vision_end|>\"+ '<|im_end|>\\n' }}\n {%- elif content.type == 'audio' or 'audio' in content or 'audio_url' in content %}\n {{- '<|im_start|>system\\n' +\"<|audio_start|><|audio_pad|><|audio_end|>\"+ '<|im_end|>\\n' }}\n {%- elif content.type == 'video' or 'video' in content %}\n {{- '<|im_start|>system\\n' +\"<|vision_start|><|video_pad|><|vision_end|>\"+ '<|im_end|>\\n' }}\n {%- elif content.type == 'text' %}\n {{- '<|im_start|>system\\n' +content.text+ '<|im_end|>\\n' }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = namespace(text=\"\") %}\n {%- for mcontent in message.content %}\n {%- if mcontent.type == 'image' or 'image' in mcontent or 'image_url' in mcontent %}\n {%- set content.text = content.text~\"<|vision_start|><|image_pad|><|vision_end|>\" %}\n {%- elif mcontent.type == 'audio' or 'audio' in mcontent or 'audio_url' in mcontent %}\n {%- set content.text = content.text~\"<|audio_start|><|audio_pad|><|audio_end|>\" %}\n {%- elif mcontent.type == 'video' or 'video' in mcontent %}\n {%- set content.text = content.text~\"<|vision_start|><|video_pad|><|vision_end|>\" %}\n {%- elif mcontent.type == 'text' %}\n {%- set content.text = content.text~mcontent.text %}\n {%- endif %}\n {%- endfor %}\n {%- set content = content.text %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = \"\" %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip(\"\\n\") + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}{{- '\\n' }}{%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}{{- '<|im_start|>user' }}{%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}{{- '<|im_end|>\\n' }}{%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}{{- '<think>\\n\\n</think>\\n\\n' }}{%- endif %}\n{%- endif %}"}
config.json ADDED
@@ -0,0 +1,485 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3OmniMoeForConditionalGeneration"
4
+ ],
5
+ "assistant_token_id": 77091,
6
+ "code2wav_config": {
7
+ "attention_bias": false,
8
+ "attention_dropout": 0.0,
9
+ "codebook_dim": 512,
10
+ "codebook_size": 2048,
11
+ "decoder_dim": 1536,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 1024,
14
+ "intermediate_size": 3072,
15
+ "layer_scale_initial_scale": 0.01,
16
+ "max_position_embeddings": 8000,
17
+ "model_type": "",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 8,
20
+ "num_key_value_heads": 16,
21
+ "num_quantizers": 16,
22
+ "num_semantic_quantizers": 1,
23
+ "rms_norm_eps": 1e-05,
24
+ "rope_theta": 10000,
25
+ "semantic_codebook_size": 4096,
26
+ "sliding_window": 72,
27
+ "upsample_rates": [
28
+ 8,
29
+ 5,
30
+ 4,
31
+ 3
32
+ ],
33
+ "upsampling_ratios": [
34
+ 2,
35
+ 2
36
+ ],
37
+ "vector_quantization_hidden_dimension": 512
38
+ },
39
+ "dtype": "bfloat16",
40
+ "enable_audio_output": true,
41
+ "im_end_token_id": 151645,
42
+ "im_start_token_id": 151644,
43
+ "model_type": "qwen3_omni_moe",
44
+ "system_token_id": 8948,
45
+ "talker_config": {
46
+ "text_config":{
47
+ "attention_bias": false,
48
+ "attention_dropout": 0,
49
+ "decoder_sparse_step": 1,
50
+ "head_dim": 128,
51
+ "hidden_act": "silu",
52
+ "hidden_size": 1024,
53
+ "initializer_range": 0.02,
54
+ "intermediate_size": 2048,
55
+ "max_position_embeddings": 65536,
56
+ "mlp_only_layers": [],
57
+ "moe_intermediate_size": 384,
58
+ "norm_topk_prob": true,
59
+ "num_attention_heads": 16,
60
+ "num_experts": 128,
61
+ "num_experts_per_tok": 6,
62
+ "num_hidden_layers": 20,
63
+ "num_key_value_heads": 2,
64
+ "rms_norm_eps": 1e-06,
65
+ "rope_scaling": {
66
+ "interleaved": true,
67
+ "mrope_section": [
68
+ 24,
69
+ 20,
70
+ 20
71
+ ],
72
+ "rope_type": "default",
73
+ "type": "default"
74
+ },
75
+ "rope_theta": 1000000,
76
+ "router_aux_loss_coef": 0.001,
77
+ "shared_expert_intermediate_size": 768,
78
+ "sliding_window": null,
79
+ "use_cache": true,
80
+ "use_sliding_window": false,
81
+ "vocab_size": 3072
82
+ },
83
+ "accept_hidden_layer": 24,
84
+ "audio_end_token_id": 151670,
85
+ "audio_start_token_id": 151669,
86
+ "audio_token_id": 151675,
87
+ "code_predictor_config": {
88
+ "_name_or_path": "",
89
+ "add_cross_attention": false,
90
+ "architectures": null,
91
+ "attention_bias": false,
92
+ "attention_dropout": 0,
93
+ "bad_words_ids": null,
94
+ "begin_suppress_tokens": null,
95
+ "bos_token_id": null,
96
+ "chunk_size_feed_forward": 0,
97
+ "cross_attention_hidden_size": null,
98
+ "decoder_start_token_id": null,
99
+ "diversity_penalty": 0.0,
100
+ "do_sample": false,
101
+ "dtype": null,
102
+ "early_stopping": false,
103
+ "encoder_no_repeat_ngram_size": 0,
104
+ "eos_token_id": null,
105
+ "exponential_decay_length_penalty": null,
106
+ "finetuning_task": null,
107
+ "forced_bos_token_id": null,
108
+ "forced_eos_token_id": null,
109
+ "head_dim": 128,
110
+ "hidden_act": "silu",
111
+ "hidden_size": 1024,
112
+ "id2label": {
113
+ "0": "LABEL_0",
114
+ "1": "LABEL_1"
115
+ },
116
+ "initializer_range": 0.02,
117
+ "intermediate_size": 3072,
118
+ "is_decoder": false,
119
+ "is_encoder_decoder": false,
120
+ "label2id": {
121
+ "LABEL_0": 0,
122
+ "LABEL_1": 1
123
+ },
124
+ "layer_types": [
125
+ "full_attention",
126
+ "full_attention",
127
+ "full_attention",
128
+ "full_attention",
129
+ "full_attention"
130
+ ],
131
+ "length_penalty": 1.0,
132
+ "max_length": 20,
133
+ "max_position_embeddings": 32768,
134
+ "max_window_layers": 28,
135
+ "min_length": 0,
136
+ "model_type": "qwen3_omni_moe_talker_code_predictor",
137
+ "no_repeat_ngram_size": 0,
138
+ "num_attention_heads": 16,
139
+ "num_beam_groups": 1,
140
+ "num_beams": 1,
141
+ "num_code_groups": 16,
142
+ "num_hidden_layers": 5,
143
+ "num_key_value_heads": 8,
144
+ "num_return_sequences": 1,
145
+ "output_attentions": false,
146
+ "output_hidden_states": false,
147
+ "output_scores": false,
148
+ "pad_token_id": null,
149
+ "prefix": null,
150
+ "problem_type": null,
151
+ "pruned_heads": {},
152
+ "remove_invalid_values": false,
153
+ "repetition_penalty": 1.0,
154
+ "return_dict": true,
155
+ "return_dict_in_generate": false,
156
+ "rms_norm_eps": 1e-06,
157
+ "rope_scaling": null,
158
+ "rope_theta": 1000000,
159
+ "sep_token_id": null,
160
+ "sliding_window": null,
161
+ "suppress_tokens": null,
162
+ "task_specific_params": null,
163
+ "temperature": 1.0,
164
+ "tf_legacy_loss": false,
165
+ "tie_encoder_decoder": false,
166
+ "tie_word_embeddings": false,
167
+ "tokenizer_class": null,
168
+ "top_k": 50,
169
+ "top_p": 1.0,
170
+ "torchscript": false,
171
+ "typical_p": 1.0,
172
+ "use_bfloat16": false,
173
+ "use_cache": true,
174
+ "use_sliding_window": false,
175
+ "vocab_size": 2048
176
+ },
177
+ "codec_bos_id": 2149,
178
+ "codec_eos_token_id": 2150,
179
+ "codec_nothink_id": 2155,
180
+ "codec_pad_id": 2148,
181
+ "codec_think_bos_id": 2156,
182
+ "codec_think_eos_id": 2157,
183
+ "image_token_id": 151655,
184
+ "model_type": "qwen3_omni_moe_talker",
185
+ "num_code_groups": 16,
186
+ "output_router_logits": false,
187
+ "position_id_per_seconds": 13,
188
+ "seconds_per_chunk": 2,
189
+ "spatial_merge_size": 2,
190
+ "speaker_id": {
191
+ "chelsie": 2301,
192
+ "ethan": 2302,
193
+ "aiden": 2303
194
+ },
195
+ "thinker_hidden_size": 2048,
196
+ "video_token_id": 151656,
197
+ "vision_start_token_id": 151652
198
+ },
199
+ "thinker_config": {
200
+ "audio_config": {
201
+ "_name_or_path": "",
202
+ "activation_dropout": 0,
203
+ "activation_function": "gelu",
204
+ "add_cross_attention": false,
205
+ "architectures": null,
206
+ "attention_dropout": 0,
207
+ "bad_words_ids": null,
208
+ "begin_suppress_tokens": null,
209
+ "bos_token_id": null,
210
+ "chunk_size_feed_forward": 0,
211
+ "conv_chunksize": 500,
212
+ "cross_attention_hidden_size": null,
213
+ "d_model": 1280,
214
+ "decoder_start_token_id": null,
215
+ "diversity_penalty": 0.0,
216
+ "do_sample": false,
217
+ "downsample_hidden_size":480,
218
+ "dropout": 0,
219
+ "dtype": null,
220
+ "early_stopping": false,
221
+ "encoder_attention_heads": 20,
222
+ "encoder_ffn_dim": 5120,
223
+ "encoder_layers": 32,
224
+ "encoder_no_repeat_ngram_size": 0,
225
+ "eos_token_id": null,
226
+ "exponential_decay_length_penalty": null,
227
+ "finetuning_task": null,
228
+ "forced_bos_token_id": null,
229
+ "forced_eos_token_id": null,
230
+ "id2label": {
231
+ "0": "LABEL_0",
232
+ "1": "LABEL_1"
233
+ },
234
+ "initializer_range": 0.02,
235
+ "is_decoder": false,
236
+ "is_encoder_decoder": false,
237
+ "label2id": {
238
+ "LABEL_0": 0,
239
+ "LABEL_1": 1
240
+ },
241
+ "length_penalty": 1.0,
242
+ "max_length": 20,
243
+ "max_source_positions": 1500,
244
+ "min_length": 0,
245
+ "model_type": "qwen3_omni_moe_audio_encoder",
246
+ "n_window": 50,
247
+ "n_window_infer": 800,
248
+ "no_repeat_ngram_size": 0,
249
+ "num_beam_groups": 1,
250
+ "num_beams": 1,
251
+ "num_hidden_layers": 32,
252
+ "num_mel_bins": 128,
253
+ "num_return_sequences": 1,
254
+ "output_attentions": false,
255
+ "output_dim": 2048,
256
+ "output_hidden_states": false,
257
+ "output_scores": false,
258
+ "pad_token_id": null,
259
+ "prefix": null,
260
+ "problem_type": null,
261
+ "pruned_heads": {},
262
+ "remove_invalid_values": false,
263
+ "repetition_penalty": 1.0,
264
+ "return_dict": true,
265
+ "return_dict_in_generate": false,
266
+ "scale_embedding": false,
267
+ "sep_token_id": null,
268
+ "suppress_tokens": null,
269
+ "task_specific_params": null,
270
+ "temperature": 1.0,
271
+ "tf_legacy_loss": false,
272
+ "tie_encoder_decoder": false,
273
+ "tie_word_embeddings": true,
274
+ "tokenizer_class": null,
275
+ "top_k": 50,
276
+ "top_p": 1.0,
277
+ "torchscript": false,
278
+ "typical_p": 1.0,
279
+ "use_bfloat16": false
280
+ },
281
+ "audio_end_token_id": 151670,
282
+ "audio_start_token_id": 151669,
283
+ "audio_token_id": 151675,
284
+ "dtype": "bfloat16",
285
+ "image_token_id": 151655,
286
+ "initializer_range": 0.02,
287
+ "model_type": "qwen3_omni_moe_thinker",
288
+ "position_id_per_seconds": 13,
289
+ "seconds_per_chunk": 2,
290
+ "text_config": {
291
+ "_name_or_path": "",
292
+ "add_cross_attention": false,
293
+ "architectures": null,
294
+ "attention_bias": false,
295
+ "attention_dropout": 0.0,
296
+ "bad_words_ids": null,
297
+ "begin_suppress_tokens": null,
298
+ "bos_token_id": null,
299
+ "chunk_size_feed_forward": 0,
300
+ "cross_attention_hidden_size": null,
301
+ "decoder_sparse_step": 1,
302
+ "decoder_start_token_id": null,
303
+ "diversity_penalty": 0.0,
304
+ "do_sample": false,
305
+ "dtype": null,
306
+ "early_stopping": false,
307
+ "encoder_no_repeat_ngram_size": 0,
308
+ "eos_token_id": null,
309
+ "exponential_decay_length_penalty": null,
310
+ "finetuning_task": null,
311
+ "forced_bos_token_id": null,
312
+ "forced_eos_token_id": null,
313
+ "head_dim": 128,
314
+ "hidden_act": "silu",
315
+ "hidden_size": 2048,
316
+ "id2label": {
317
+ "0": "LABEL_0",
318
+ "1": "LABEL_1"
319
+ },
320
+ "initializer_range": 0.02,
321
+ "intermediate_size": 768,
322
+ "is_decoder": false,
323
+ "is_encoder_decoder": false,
324
+ "label2id": {
325
+ "LABEL_0": 0,
326
+ "LABEL_1": 1
327
+ },
328
+ "length_penalty": 1.0,
329
+ "max_length": 20,
330
+ "max_position_embeddings": 65536,
331
+ "min_length": 0,
332
+ "mlp_only_layers": [],
333
+ "model_type": "qwen3_omni_moe_text",
334
+ "moe_intermediate_size": 768,
335
+ "no_repeat_ngram_size": 0,
336
+ "norm_topk_prob": true,
337
+ "num_attention_heads": 32,
338
+ "num_beam_groups": 1,
339
+ "num_beams": 1,
340
+ "num_experts": 128,
341
+ "num_experts_per_tok": 8,
342
+ "num_hidden_layers": 48,
343
+ "num_key_value_heads": 4,
344
+ "num_return_sequences": 1,
345
+ "output_attentions": false,
346
+ "output_hidden_states": false,
347
+ "output_router_logits": false,
348
+ "output_scores": false,
349
+ "pad_token_id": null,
350
+ "prefix": null,
351
+ "problem_type": null,
352
+ "pruned_heads": {},
353
+ "remove_invalid_values": false,
354
+ "repetition_penalty": 1.0,
355
+ "return_dict": true,
356
+ "return_dict_in_generate": false,
357
+ "rms_norm_eps": 1e-06,
358
+ "rope_scaling": {
359
+ "interleaved": true,
360
+ "mrope_interleaved": true,
361
+ "mrope_section": [
362
+ 24,
363
+ 20,
364
+ 20
365
+ ],
366
+ "rope_type": "default",
367
+ "type": "default"
368
+ },
369
+ "rope_theta": 1000000,
370
+ "router_aux_loss_coef": 0.001,
371
+ "sep_token_id": null,
372
+ "shared_expert_intermediate_size": 0,
373
+ "sliding_window": null,
374
+ "suppress_tokens": null,
375
+ "task_specific_params": null,
376
+ "temperature": 1.0,
377
+ "tf_legacy_loss": false,
378
+ "tie_encoder_decoder": false,
379
+ "tie_word_embeddings": false,
380
+ "tokenizer_class": null,
381
+ "top_k": 50,
382
+ "top_p": 1.0,
383
+ "torchscript": false,
384
+ "typical_p": 1.0,
385
+ "use_bfloat16": false,
386
+ "use_cache": true,
387
+ "use_qk_norm": true,
388
+ "use_sliding_window": false,
389
+ "vocab_size": 152064
390
+ },
391
+ "user_token_id": 872,
392
+ "video_token_id": 151656,
393
+ "vision_config": {
394
+ "_name_or_path": "",
395
+ "add_cross_attention": false,
396
+ "apply_vit_abs_pos_embed": true,
397
+ "architectures": null,
398
+ "bad_words_ids": null,
399
+ "begin_suppress_tokens": null,
400
+ "bos_token_id": null,
401
+ "chunk_size_feed_forward": 0,
402
+ "cross_attention_hidden_size": null,
403
+ "decoder_start_token_id": null,
404
+ "deepstack_visual_indexes": [
405
+ 8,
406
+ 16,
407
+ 24
408
+ ],
409
+ "depth": 27,
410
+ "diversity_penalty": 0.0,
411
+ "do_sample": false,
412
+ "dtype": null,
413
+ "early_stopping": false,
414
+ "encoder_no_repeat_ngram_size": 0,
415
+ "eos_token_id": null,
416
+ "exponential_decay_length_penalty": null,
417
+ "finetuning_task": null,
418
+ "forced_bos_token_id": null,
419
+ "forced_eos_token_id": null,
420
+ "hidden_act": "gelu_pytorch_tanh",
421
+ "hidden_size": 1152,
422
+ "id2label": {
423
+ "0": "LABEL_0",
424
+ "1": "LABEL_1"
425
+ },
426
+ "image_size": 768,
427
+ "in_channels": 3,
428
+ "in_chans": 3,
429
+ "initializer_range": 0.02,
430
+ "intermediate_size": 4304,
431
+ "is_decoder": false,
432
+ "is_encoder_decoder": false,
433
+ "label2id": {
434
+ "LABEL_0": 0,
435
+ "LABEL_1": 1
436
+ },
437
+ "length_penalty": 1.0,
438
+ "max_length": 20,
439
+ "min_length": 0,
440
+ "model_type": "qwen3_omni_moe_vision_encoder",
441
+ "no_repeat_ngram_size": 0,
442
+ "num_beam_groups": 1,
443
+ "num_beams": 1,
444
+ "num_heads": 16,
445
+ "num_return_sequences": 1,
446
+ "out_hidden_size": 2048,
447
+ "output_attentions": false,
448
+ "output_hidden_states": false,
449
+ "output_scores": false,
450
+ "pad_token_id": null,
451
+ "patch_size": 16,
452
+ "prefix": null,
453
+ "problem_type": null,
454
+ "pruned_heads": {},
455
+ "remove_invalid_values": false,
456
+ "repetition_penalty": 1.0,
457
+ "return_dict": true,
458
+ "return_dict_in_generate": false,
459
+ "sep_token_id": null,
460
+ "spatial_merge_size": 2,
461
+ "spatial_patch_size": 16,
462
+ "suppress_tokens": null,
463
+ "task_specific_params": null,
464
+ "temperature": 1.0,
465
+ "temporal_patch_size": 2,
466
+ "tf_legacy_loss": false,
467
+ "tie_encoder_decoder": false,
468
+ "tie_word_embeddings": true,
469
+ "tokenizer_class": null,
470
+ "tokens_per_second": 2,
471
+ "top_k": 50,
472
+ "top_p": 1.0,
473
+ "torchscript": false,
474
+ "typical_p": 1.0,
475
+ "use_bfloat16": false
476
+ },
477
+ "vision_end_token_id": 151653,
478
+ "vision_start_token_id": 151652
479
+ },
480
+ "transformers_version": "4.57.0.dev0",
481
+ "tts_bos_token_id": 151672,
482
+ "tts_eos_token_id": 151673,
483
+ "tts_pad_token_id": 151671,
484
+ "user_token_id": 872
485
+ }
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"any-to-any"}
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "talker_max_new_tokens": 4096,
3
+ "talker_repetition_penalty": 1.05,
4
+ "talker_temperature": 0.9,
5
+ "talker_top_k": 50,
6
+ "talker_top_p": 1.0
7
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f95d142b727fbd0698f0ed9478e222d4f29467df855e21862e46bd1401049065
3
+ size 4997899632
model-00002-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:320c1f31f7ddc4efd2501891590eca2c481ce4eca3db33d70ebbb637dae85764
3
+ size 4997754216
model-00003-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:306f06b05ef032f1fa51eb62e72c3432324e7a084fa0d15ed17ff856adffcf33
3
+ size 4997754216
model-00004-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4cb551f433bfb4d83a5843e433e74688d71a96eeb1b62217b37d719026d559b5
3
+ size 4997755648
model-00005-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8aec5e8cc0e0b179a81f690af31a1149238f4b1d18ac60e3f690020fdf7d915f
3
+ size 4997755792
model-00006-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ff40a3d9e9c59a35a10257c1c5b216f472e6a102a4d6059bd31183eee03cb4c
3
+ size 4997755792
model-00007-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6d84cc1d47178850f9ba1fe9c01f45ee63d9a8b9975f92821ee2b4a0387860b
3
+ size 4997755792
model-00008-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f369ccdc9d2d2a55df794005b23cdbce44494a72e4a0069a150ae471cfea720
3
+ size 4997755792
model-00009-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:980f754b21c4b20f36e02d184b792f5625f7c9d24b8a5c9da94287cf3cfde11b
3
+ size 4997755792
model-00010-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c5a798376321d74df2eb3f235e1c8c68669f09b0c753b5698a16e681204d808
3
+ size 4997755792
model-00011-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f7c051bd391c89279b43e7342ff4a243ad2ba2144f4aa62233fd2b02d72fce0
3
+ size 4997755792
model-00012-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8b09a9367cd31ebf928e87b9eaf002492379d9d21d352b480854c2fbe6d9978
3
+ size 4997755792
model-00013-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9b7441c3a33e066d109cd6338e74ed680de9aa407d4df66eba5cb3c15c6d738
3
+ size 4999771808
model-00014-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73f09f187ff7b469a8937b39b8ac772e539e0d61311526c345558c5e6a6eb09a
3
+ size 4996618552
model-00015-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68880d566a50643807a4e73f5227c4f468bb31532925ed5fae78a51bdd39492a
3
+ size 553698794
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
preprocessor_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dither": 0.0,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "image_mean": [
7
+ 0.5,
8
+ 0.5,
9
+ 0.5
10
+ ],
11
+ "image_processor_type": "Qwen2VLImageProcessor",
12
+ "image_std": [
13
+ 0.5,
14
+ 0.5,
15
+ 0.5
16
+ ],
17
+ "max_pixels": 12845056,
18
+ "merge_size": 2,
19
+ "min_pixels": 3136,
20
+ "n_fft": 400,
21
+ "n_samples": 4800000,
22
+ "nb_max_frames": 30000,
23
+ "padding_side": "right",
24
+ "padding_value": 0.0,
25
+ "patch_size": 16,
26
+ "processor_class": "Qwen3OmniMoeProcessor",
27
+ "return_attention_mask": true,
28
+ "sampling_rate": 16000,
29
+ "temporal_patch_size": 2
30
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ },
213
+ "151669": {
214
+ "content": "<|audio_start|>",
215
+ "lstrip": false,
216
+ "normalized": false,
217
+ "rstrip": false,
218
+ "single_word": false,
219
+ "special": true
220
+ },
221
+ "151670": {
222
+ "content": "<|audio_end|>",
223
+ "lstrip": false,
224
+ "normalized": false,
225
+ "rstrip": false,
226
+ "single_word": false,
227
+ "special": true
228
+ },
229
+ "151671": {
230
+ "content": "<tts_pad>",
231
+ "lstrip": false,
232
+ "normalized": false,
233
+ "rstrip": false,
234
+ "single_word": false,
235
+ "special": true
236
+ },
237
+ "151672": {
238
+ "content": "<tts_text_bos>",
239
+ "lstrip": false,
240
+ "normalized": false,
241
+ "rstrip": false,
242
+ "single_word": false,
243
+ "special": true
244
+ },
245
+ "151673": {
246
+ "content": "<tts_text_eod>",
247
+ "lstrip": false,
248
+ "normalized": false,
249
+ "rstrip": false,
250
+ "single_word": false,
251
+ "special": true
252
+ },
253
+ "151674": {
254
+ "content": "<tts_text_bos_single>",
255
+ "lstrip": false,
256
+ "normalized": false,
257
+ "rstrip": false,
258
+ "single_word": false,
259
+ "special": true
260
+ },
261
+ "151675": {
262
+ "content": "<|audio_pad|>",
263
+ "lstrip": false,
264
+ "normalized": false,
265
+ "rstrip": false,
266
+ "single_word": false,
267
+ "special": true
268
+ }
269
+ },
270
+ "additional_special_tokens": [
271
+ "<|im_start|>",
272
+ "<|im_end|>",
273
+ "<|object_ref_start|>",
274
+ "<|object_ref_end|>",
275
+ "<|box_start|>",
276
+ "<|box_end|>",
277
+ "<|quad_start|>",
278
+ "<|quad_end|>",
279
+ "<|vision_start|>",
280
+ "<|vision_end|>",
281
+ "<|vision_pad|>",
282
+ "<|image_pad|>",
283
+ "<|video_pad|>",
284
+ "<|audio_start|>",
285
+ "<|audio_end|>",
286
+ "<tts_pad>",
287
+ "<tts_text_bos>",
288
+ "<tts_text_bos_single>",
289
+ "<|audio_pad|>"
290
+ ],
291
+ "extra_special_tokens": {
292
+ "image_token": "<|image_pad|>",
293
+ "audio_token": "<|audio_pad|>",
294
+ "video_token": "<|video_pad|>",
295
+ "vision_bos_token": "<|vision_start|>",
296
+ "vision_eos_token": "<|vision_end|>",
297
+ "audio_bos_token": "<|audio_start|>",
298
+ "audio_eos_token": "<|audio_end|>"
299
+ },
300
+ "bos_token": null,
301
+ "clean_up_tokenization_spaces": false,
302
+ "eos_token": "<|im_end|>",
303
+ "errors": "replace",
304
+ "model_max_length": 131072,
305
+ "pad_token": "<|endoftext|>",
306
+ "split_special_tokens": false,
307
+ "tokenizer_class": "Qwen2Tokenizer",
308
+ "unk_token": null,
309
+ "image_token": "<|image_pad|>",
310
+ "audio_token": "<|audio_pad|>",
311
+ "video_token": "<|video_pad|>",
312
+ "vision_bos_token": "<|vision_start|>",
313
+ "vision_eos_token": "<|vision_end|>",
314
+ "audio_bos_token": "<|audio_start|>",
315
+ "audio_eos_token": "<|audio_end|>"
316
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff