YummyYum commited on Sep 25

Commit

e00c0df

verified ·

1 Parent(s): 247c6f3

Upload folder using huggingface_hub

Browse files

Files changed (27) hide show

.msc +0 -0
.mv +1 -0
README.md +162 -0
chat_template.json +1 -0
config.json +485 -0
configuration.json +1 -0
generation_config.json +7 -0
merges.txt +0 -0
model-00001-of-00015.safetensors +3 -0
model-00002-of-00015.safetensors +3 -0
model-00003-of-00015.safetensors +3 -0
model-00004-of-00015.safetensors +3 -0
model-00005-of-00015.safetensors +3 -0
model-00006-of-00015.safetensors +3 -0
model-00007-of-00015.safetensors +3 -0
model-00008-of-00015.safetensors +3 -0
model-00009-of-00015.safetensors +3 -0
model-00010-of-00015.safetensors +3 -0
model-00011-of-00015.safetensors +3 -0
model-00012-of-00015.safetensors +3 -0
model-00013-of-00015.safetensors +3 -0
model-00014-of-00015.safetensors +3 -0
model-00015-of-00015.safetensors +3 -0
model.safetensors.index.json +0 -0
preprocessor_config.json +30 -0
tokenizer_config.json +316 -0
vocab.json +0 -0

.msc ADDED Viewed

Binary file (2.03 kB). View file

.mv ADDED Viewed

	@@ -0,0 +1 @@


1	+ Revision:master,CreatedAt:1758550852

README.md ADDED Viewed

	@@ -0,0 +1,162 @@

+# Introduction
+**FlagOS** is a unified heterogeneous computing software stack for large models, co-developed with leading global chip manufacturers. With core technologies such as the **FlagScale** distributed training/inference framework, **FlagGems** universal operator library, **FlagCX** communication library, and **FlagTree** unified compiler, the **FlagRelease** platform leverages the FlagOS stack to automatically produce and release various combinations of <chip + open-source model>. This enables efficient and automated model migration across diverse chips, opening a new chapter for large model deployment and application.
+Based on this, the **Qwen3-Omni-30B-A3B-Instruct-FlagOS** model is adapted for the Nvidia chip using the FlagOS software stack, enabling:
+### Integrated Deployment
+- Deep integration with the open-source [FlagScale framework](https://github.com/FlagOpen/FlagScale)
+- Out-of-the-box inference scripts with pre-configured hardware and software parameters
+- Released **FlagOS** container image supporting deployment within minutes
+### Consistency Validation
+- Rigorously evaluated through benchmark testing: Performance and results from the FlagOS software stack are compared against native stacks on multiple public.
+# Technical Overview
+## **FlagScale Distributed Training and Inference Framework**
+FlagScale is an end-to-end framework for large models across heterogeneous computing resources, maximizing computational efficiency and ensuring model validity through core technologies. Its key advantages include:
+- **Unified Deployment Interface:** Standardized command-line tools support one-click service deployment across multiple hardware platforms, significantly reducing adaptation costs in heterogeneous environments.
+- **Intelligent Parallel Optimization:** Automatically generates optimal distributed parallel strategies based on chip computing characteristics, achieving dynamic load balancing of computation/communication resources.
+- **Seamless Operator Switching:** Deep integration with the FlagGems operator library allows high-performance operators to be invoked via environment variables without modifying model code.
+## **FlagGems Universal Large-Model Operator Library**
+FlagGems is a Triton-based, cross-architecture operator library collaboratively developed with industry partners. Its core strengths include:
+- **Full-stack Coverage**: Over 100 operators, with a broader range of operator types than competing libraries.
+- **Ecosystem Compatibility**: Supports 7 accelerator backends. Ongoing optimizations have significantly improved performance.
+- **High Efficiency**: Employs unique code generation and runtime optimization techniques for faster secondary development and better runtime performance compared to alternatives.
+## **FlagEval Evaluation Framework**
+FlagEval (Libra)** is a comprehensive evaluation system and open platform for large models launched in 2023. It aims to establish scientific, fair, and open benchmarks, methodologies, and tools to help researchers assess model and training algorithm performance. It features:
+  - **Multi-dimensional Evaluation**: Supports 800+ model evaluations across NLP, CV, Audio, and Multimodal fields, covering 20+ downstream tasks including language understanding and image-text generation.
+  - **Industry-Grade Use Cases**: Has completed horizontal evaluations of mainstream large models, providing authoritative benchmarks for chip-model performance validation.
+# Evaluation Results
+## Benchmark Result
+| Metrics           | Qwen3-Omni-30B-A3B-Instruct-H100-CUDA | Qwen3-Omni-30B-A3B-Instruct-FlagOS |
+|-------------------|--------------------------|-----------------------------|
+| CMMMU | 50.440 | 50.890 |
+| MMMU | 60.780 | 59.330 |
+| MMMU_Pro_standard | 44.650 | 43.610 |
+| MMMU_Pro_vision | 39.770 | 38.730 |
+| MM-Vet v2 | 59.225 | 61.919 |
+| OCRBench | 78.700 | 78.500 |
+| MathVision | 57.530 | 57.570 |
+| CII-Bench | 60.920 | 60.260 |
+| Blink | 57.710 | 57.440 |
+| AIME_0fewshot_@avg1 | 76.667 | 83.333 |
+| GPQA_0fewshot_@avg1 | 60.403 | 61.829 |
+| LiveBench-0fewshot_@avg1 | 68.926 | 69.598 |
+| MMLU_5fewshot_@avg1 | 78.383 | 78.449 |
+| MUSR_0fewshot_@avg | 59.392 | 60.185 |
+# User Guide
+**Environment Setup**
+| Item | Version          |
+| ------------- | ------------------------------------------------------------ |
+| Docker Version                  | Docker version 28.1.0, build 4d8c241 |
+| Operating System                | Ubuntu 22.04.5 LTS    |
+| FlagScale                       | Version: 0.8.0                        |
+| FlagGems                        | Version: 3.0                          |
+## Operation Steps
+### Download Open-source Model Weights
+```bash
+pip install modelscope
+modelscope download --model Qwen/Qwen3-Omni-30B-A3B-Instruct --local_dir /share/Qwen3-Omni-30B-A3B-Instruct
+```
+### Download FlagOS Image
+```bash
+docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease_nvidia_qwen3o
+```
+### Start the inference service
+```bash
+#Container Startup
+docker run --rm --init --detach   --net=host --uts=host --ipc=host   --security-opt=seccomp=unconfined   --privileged=true   --ulimit stack=67108864   --ulimit memlock=-1   --ulimit nofile=1048576:1048576   --shm-size=32G   -v /share:/share   --gpus all   --name flagos   harbor.baai.ac.cn/flagrelease-public/flagrelease_nvidia_qwen3o   sleep infinity
+```
+### Serve
+```bash
+flagscale serve qwen3_o
+```
+## Service Invocation
+### API-based Invocation Script
+```bash
+import openai
+openai.api_key = "EMPTY"
+openai.base_url = "http://<server_ip>:9010/v1/"
+model = "Qwen3-Omni-30B-A3B-Instruct-nvidia-flagos"
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What's the weather like today?"}
+]
+response = openai.chat.completions.create(
+    model=model,
+    messages=messages,
+    stream=False,
+)
+for item in response:
+    print(item)
+```
+### AnythingLLM Integration Guide
+#### 1. Download & Install
+- Visit the official site: https://anythingllm.com/
+- Choose the appropriate version for your OS (Windows/macOS/Linux)
+- Follow the installation wizard to complete the setup
+#### 2. Configuration
+- Launch AnythingLLM
+- Open settings (bottom left, fourth tab)
+- Configure core LLM parameters
+- Click "Save Settings" to apply changes
+#### 3. Model Interaction
+- After model loading is complete:
+  - Click **"New Conversation"**
+  - Enter your question (e.g., “Explain the basics of quantum computing”)
+  - Click the send button to get a response
+# Contributing
+We warmly welcome global developers to join us:
+1. Submit Issues to report problems
+2. Create Pull Requests to contribute code
+3. Improve technical documentation
+4. Expand hardware adaptation support
+# License
+本模型的权重来源于Qwen/Qwen3-Omni-30B-A3B-Instruct，以apache2.0协议https://www.apache.org/licenses/LICENSE-2.0.txt开源。

chat_template.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {%- if messages[0].content is string %}\n {{- messages[0].content }}\n {%- else %}\n {%- for content in messages[0].content %}\n {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}\n {{- \"<|vision_start|><|image_pad|><|vision_end|>\" }}\n {%- elif content.type == 'audio' or 'audio' in content or 'audio_url' in content %}\n {{- \"<|audio_start|><|audio_pad|><|audio_end|>\" }}\n {%- elif content.type == 'video' or 'video' in content %}\n {{- \"<|vision_start|><|video_pad|><|vision_end|>\" }}\n {%- elif content.type == 'text' %}\n {{- content.text }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {%- endif %}\n {{- '\\n\\n' }}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {%- if messages[0].content is string %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- else %}\n {%- for content in messages[0].content %}\n {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}\n {{- '<|im_start|>system\\n' +\"<|vision_start|><|image_pad|><|vision_end|>\"+ '<|im_end|>\\n' }}\n {%- elif content.type == 'audio' or 'audio' in content or 'audio_url' in content %}\n {{- '<|im_start|>system\\n' +\"<|audio_start|><|audio_pad|><|audio_end|>\"+ '<|im_end|>\\n' }}\n {%- elif content.type == 'video' or 'video' in content %}\n {{- '<|im_start|>system\\n' +\"<|vision_start|><|video_pad|><|vision_end|>\"+ '<|im_end|>\\n' }}\n {%- elif content.type == 'text' %}\n {{- '<|im_start|>system\\n' +content.text+ '<|im_end|>\\n' }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = namespace(text=\"\") %}\n {%- for mcontent in message.content %}\n {%- if mcontent.type == 'image' or 'image' in mcontent or 'image_url' in mcontent %}\n {%- set content.text = content.text~\"<|vision_start|><|image_pad|><|vision_end|>\" %}\n {%- elif mcontent.type == 'audio' or 'audio' in mcontent or 'audio_url' in mcontent %}\n {%- set content.text = content.text~\"<|audio_start|><|audio_pad|><|audio_end|>\" %}\n {%- elif mcontent.type == 'video' or 'video' in mcontent %}\n {%- set content.text = content.text~\"<|vision_start|><|video_pad|><|vision_end|>\" %}\n {%- elif mcontent.type == 'text' %}\n {%- set content.text = content.text~mcontent.text %}\n {%- endif %}\n {%- endfor %}\n {%- set content = content.text %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = \"\" %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip(\"\\n\") + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}{{- '\\n' }}{%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}{{- '<|im_start|>user' }}{%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}{{- '<|im_end|>\\n' }}{%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}{{- '<think>\\n\\n</think>\\n\\n' }}{%- endif %}\n{%- endif %}"}

config.json ADDED Viewed

	@@ -0,0 +1,485 @@

+{
+  "architectures": [
+    "Qwen3OmniMoeForConditionalGeneration"
+  ],
+  "assistant_token_id": 77091,
+  "code2wav_config": {
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "codebook_dim": 512,
+    "codebook_size": 2048,
+    "decoder_dim": 1536,
+    "hidden_act": "silu",
+    "hidden_size": 1024,
+    "intermediate_size": 3072,
+    "layer_scale_initial_scale": 0.01,
+    "max_position_embeddings": 8000,
+    "model_type": "",
+    "num_attention_heads": 16,
+    "num_hidden_layers": 8,
+    "num_key_value_heads": 16,
+    "num_quantizers": 16,
+    "num_semantic_quantizers": 1,
+    "rms_norm_eps": 1e-05,
+    "rope_theta": 10000,
+    "semantic_codebook_size": 4096,
+    "sliding_window": 72,
+    "upsample_rates": [
+      8,
+      5,
+      4,
+      3
+    ],
+    "upsampling_ratios": [
+      2,
+      2
+    ],
+    "vector_quantization_hidden_dimension": 512
+  },
+  "dtype": "bfloat16",
+  "enable_audio_output": true,
+  "im_end_token_id": 151645,
+  "im_start_token_id": 151644,
+  "model_type": "qwen3_omni_moe",
+  "system_token_id": 8948,
+  "talker_config": {
+    "text_config":{
+      "attention_bias": false,
+      "attention_dropout": 0,
+      "decoder_sparse_step": 1,
+      "head_dim": 128,
+      "hidden_act": "silu",
+      "hidden_size": 1024,
+      "initializer_range": 0.02,
+      "intermediate_size": 2048,
+      "max_position_embeddings": 65536,
+      "mlp_only_layers": [],
+      "moe_intermediate_size": 384,
+      "norm_topk_prob": true,
+      "num_attention_heads": 16,
+      "num_experts": 128,
+      "num_experts_per_tok": 6,
+      "num_hidden_layers": 20,
+      "num_key_value_heads": 2,
+      "rms_norm_eps": 1e-06,
+      "rope_scaling": {
+        "interleaved": true,
+        "mrope_section": [
+          24,
+          20,
+          20
+        ],
+        "rope_type": "default",
+        "type": "default"
+      },
+      "rope_theta": 1000000,
+      "router_aux_loss_coef": 0.001,
+      "shared_expert_intermediate_size": 768,
+      "sliding_window": null,
+      "use_cache": true,
+      "use_sliding_window": false,
+      "vocab_size": 3072
+    },
+    "accept_hidden_layer": 24,
+    "audio_end_token_id": 151670,
+    "audio_start_token_id": 151669,
+    "audio_token_id": 151675,
+    "code_predictor_config": {
+      "_name_or_path": "",
+      "add_cross_attention": false,
+      "architectures": null,
+      "attention_bias": false,
+      "attention_dropout": 0,
+      "bad_words_ids": null,
+      "begin_suppress_tokens": null,
+      "bos_token_id": null,
+      "chunk_size_feed_forward": 0,
+      "cross_attention_hidden_size": null,
+      "decoder_start_token_id": null,
+      "diversity_penalty": 0.0,
+      "do_sample": false,
+      "dtype": null,
+      "early_stopping": false,
+      "encoder_no_repeat_ngram_size": 0,
+      "eos_token_id": null,
+      "exponential_decay_length_penalty": null,
+      "finetuning_task": null,
+      "forced_bos_token_id": null,
+      "forced_eos_token_id": null,
+      "head_dim": 128,
+      "hidden_act": "silu",
+      "hidden_size": 1024,
+      "id2label": {
+        "0": "LABEL_0",
+        "1": "LABEL_1"
+      },
+      "initializer_range": 0.02,
+      "intermediate_size": 3072,
+      "is_decoder": false,
+      "is_encoder_decoder": false,
+      "label2id": {
+        "LABEL_0": 0,
+        "LABEL_1": 1
+      },
+      "layer_types": [
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention"
+      ],
+      "length_penalty": 1.0,
+      "max_length": 20,
+      "max_position_embeddings": 32768,
+      "max_window_layers": 28,
+      "min_length": 0,
+      "model_type": "qwen3_omni_moe_talker_code_predictor",
+      "no_repeat_ngram_size": 0,
+      "num_attention_heads": 16,
+      "num_beam_groups": 1,
+      "num_beams": 1,
+      "num_code_groups": 16,
+      "num_hidden_layers": 5,
+      "num_key_value_heads": 8,
+      "num_return_sequences": 1,
+      "output_attentions": false,
+      "output_hidden_states": false,
+      "output_scores": false,
+      "pad_token_id": null,
+      "prefix": null,
+      "problem_type": null,
+      "pruned_heads": {},
+      "remove_invalid_values": false,
+      "repetition_penalty": 1.0,
+      "return_dict": true,
+      "return_dict_in_generate": false,
+      "rms_norm_eps": 1e-06,
+      "rope_scaling": null,
+      "rope_theta": 1000000,
+      "sep_token_id": null,
+      "sliding_window": null,
+      "suppress_tokens": null,
+      "task_specific_params": null,
+      "temperature": 1.0,
+      "tf_legacy_loss": false,
+      "tie_encoder_decoder": false,
+      "tie_word_embeddings": false,
+      "tokenizer_class": null,
+      "top_k": 50,
+      "top_p": 1.0,
+      "torchscript": false,
+      "typical_p": 1.0,
+      "use_bfloat16": false,
+      "use_cache": true,
+      "use_sliding_window": false,
+      "vocab_size": 2048
+    },
+    "codec_bos_id": 2149,
+    "codec_eos_token_id": 2150,
+    "codec_nothink_id": 2155,
+    "codec_pad_id": 2148,
+    "codec_think_bos_id": 2156,
+    "codec_think_eos_id": 2157,
+    "image_token_id": 151655,
+    "model_type": "qwen3_omni_moe_talker",
+    "num_code_groups": 16,
+    "output_router_logits": false,
+    "position_id_per_seconds": 13,
+    "seconds_per_chunk": 2,
+    "spatial_merge_size": 2,
+    "speaker_id": {
+      "chelsie": 2301,
+      "ethan": 2302,
+      "aiden": 2303
+    },
+    "thinker_hidden_size": 2048,
+    "video_token_id": 151656,
+    "vision_start_token_id": 151652
+  },
+  "thinker_config": {
+    "audio_config": {
+      "_name_or_path": "",
+      "activation_dropout": 0,
+      "activation_function": "gelu",
+      "add_cross_attention": false,
+      "architectures": null,
+      "attention_dropout": 0,
+      "bad_words_ids": null,
+      "begin_suppress_tokens": null,
+      "bos_token_id": null,
+      "chunk_size_feed_forward": 0,
+      "conv_chunksize": 500,
+      "cross_attention_hidden_size": null,
+      "d_model": 1280,
+      "decoder_start_token_id": null,
+      "diversity_penalty": 0.0,
+      "do_sample": false,
+      "downsample_hidden_size":480,
+      "dropout": 0,
+      "dtype": null,
+      "early_stopping": false,
+      "encoder_attention_heads": 20,
+      "encoder_ffn_dim": 5120,
+      "encoder_layers": 32,
+      "encoder_no_repeat_ngram_size": 0,
+      "eos_token_id": null,
+      "exponential_decay_length_penalty": null,
+      "finetuning_task": null,
+      "forced_bos_token_id": null,
+      "forced_eos_token_id": null,
+      "id2label": {
+        "0": "LABEL_0",
+        "1": "LABEL_1"
+      },
+      "initializer_range": 0.02,
+      "is_decoder": false,
+      "is_encoder_decoder": false,
+      "label2id": {
+        "LABEL_0": 0,
+        "LABEL_1": 1
+      },
+      "length_penalty": 1.0,
+      "max_length": 20,
+      "max_source_positions": 1500,
+      "min_length": 0,
+      "model_type": "qwen3_omni_moe_audio_encoder",
+      "n_window": 50,
+      "n_window_infer": 800,
+      "no_repeat_ngram_size": 0,
+      "num_beam_groups": 1,
+      "num_beams": 1,
+      "num_hidden_layers": 32,
+      "num_mel_bins": 128,
+      "num_return_sequences": 1,
+      "output_attentions": false,
+      "output_dim": 2048,
+      "output_hidden_states": false,
+      "output_scores": false,
+      "pad_token_id": null,
+      "prefix": null,
+      "problem_type": null,
+      "pruned_heads": {},
+      "remove_invalid_values": false,
+      "repetition_penalty": 1.0,
+      "return_dict": true,
+      "return_dict_in_generate": false,
+      "scale_embedding": false,
+      "sep_token_id": null,
+      "suppress_tokens": null,
+      "task_specific_params": null,
+      "temperature": 1.0,
+      "tf_legacy_loss": false,
+      "tie_encoder_decoder": false,
+      "tie_word_embeddings": true,
+      "tokenizer_class": null,
+      "top_k": 50,
+      "top_p": 1.0,
+      "torchscript": false,
+      "typical_p": 1.0,
+      "use_bfloat16": false
+    },
+    "audio_end_token_id": 151670,
+    "audio_start_token_id": 151669,
+    "audio_token_id": 151675,
+    "dtype": "bfloat16",
+    "image_token_id": 151655,
+    "initializer_range": 0.02,
+    "model_type": "qwen3_omni_moe_thinker",
+    "position_id_per_seconds": 13,
+    "seconds_per_chunk": 2,
+    "text_config": {
+      "_name_or_path": "",
+      "add_cross_attention": false,
+      "architectures": null,
+      "attention_bias": false,
+      "attention_dropout": 0.0,
+      "bad_words_ids": null,
+      "begin_suppress_tokens": null,
+      "bos_token_id": null,
+      "chunk_size_feed_forward": 0,
+      "cross_attention_hidden_size": null,
+      "decoder_sparse_step": 1,
+      "decoder_start_token_id": null,
+      "diversity_penalty": 0.0,
+      "do_sample": false,
+      "dtype": null,
+      "early_stopping": false,
+      "encoder_no_repeat_ngram_size": 0,
+      "eos_token_id": null,
+      "exponential_decay_length_penalty": null,
+      "finetuning_task": null,
+      "forced_bos_token_id": null,
+      "forced_eos_token_id": null,
+      "head_dim": 128,
+      "hidden_act": "silu",
+      "hidden_size": 2048,
+      "id2label": {
+        "0": "LABEL_0",
+        "1": "LABEL_1"
+      },
+      "initializer_range": 0.02,
+      "intermediate_size": 768,
+      "is_decoder": false,
+      "is_encoder_decoder": false,
+      "label2id": {
+        "LABEL_0": 0,
+        "LABEL_1": 1
+      },
+      "length_penalty": 1.0,
+      "max_length": 20,
+      "max_position_embeddings": 65536,
+      "min_length": 0,
+      "mlp_only_layers": [],
+      "model_type": "qwen3_omni_moe_text",
+      "moe_intermediate_size": 768,
+      "no_repeat_ngram_size": 0,
+      "norm_topk_prob": true,
+      "num_attention_heads": 32,
+      "num_beam_groups": 1,
+      "num_beams": 1,
+      "num_experts": 128,
+      "num_experts_per_tok": 8,
+      "num_hidden_layers": 48,
+      "num_key_value_heads": 4,
+      "num_return_sequences": 1,
+      "output_attentions": false,
+      "output_hidden_states": false,
+      "output_router_logits": false,
+      "output_scores": false,
+      "pad_token_id": null,
+      "prefix": null,
+      "problem_type": null,
+      "pruned_heads": {},
+      "remove_invalid_values": false,
+      "repetition_penalty": 1.0,
+      "return_dict": true,
+      "return_dict_in_generate": false,
+      "rms_norm_eps": 1e-06,
+      "rope_scaling": {
+        "interleaved": true,
+        "mrope_interleaved": true,
+        "mrope_section": [
+          24,
+          20,
+          20
+        ],
+        "rope_type": "default",
+        "type": "default"
+      },
+      "rope_theta": 1000000,
+      "router_aux_loss_coef": 0.001,
+      "sep_token_id": null,
+      "shared_expert_intermediate_size": 0,
+      "sliding_window": null,
+      "suppress_tokens": null,
+      "task_specific_params": null,
+      "temperature": 1.0,
+      "tf_legacy_loss": false,
+      "tie_encoder_decoder": false,
+      "tie_word_embeddings": false,
+      "tokenizer_class": null,
+      "top_k": 50,
+      "top_p": 1.0,
+      "torchscript": false,
+      "typical_p": 1.0,
+      "use_bfloat16": false,
+      "use_cache": true,
+      "use_qk_norm": true,
+      "use_sliding_window": false,
+      "vocab_size": 152064
+    },
+    "user_token_id": 872,
+    "video_token_id": 151656,
+    "vision_config": {
+      "_name_or_path": "",
+      "add_cross_attention": false,
+      "apply_vit_abs_pos_embed": true,
+      "architectures": null,
+      "bad_words_ids": null,
+      "begin_suppress_tokens": null,
+      "bos_token_id": null,
+      "chunk_size_feed_forward": 0,
+      "cross_attention_hidden_size": null,
+      "decoder_start_token_id": null,
+      "deepstack_visual_indexes": [
+        8,
+        16,
+        24
+      ],
+      "depth": 27,
+      "diversity_penalty": 0.0,
+      "do_sample": false,
+      "dtype": null,
+      "early_stopping": false,
+      "encoder_no_repeat_ngram_size": 0,
+      "eos_token_id": null,
+      "exponential_decay_length_penalty": null,
+      "finetuning_task": null,
+      "forced_bos_token_id": null,
+      "forced_eos_token_id": null,
+      "hidden_act": "gelu_pytorch_tanh",
+      "hidden_size": 1152,
+      "id2label": {
+        "0": "LABEL_0",
+        "1": "LABEL_1"
+      },
+      "image_size": 768,
+      "in_channels": 3,
+      "in_chans": 3,
+      "initializer_range": 0.02,
+      "intermediate_size": 4304,
+      "is_decoder": false,
+      "is_encoder_decoder": false,
+      "label2id": {
+        "LABEL_0": 0,
+        "LABEL_1": 1
+      },
+      "length_penalty": 1.0,
+      "max_length": 20,
+      "min_length": 0,
+      "model_type": "qwen3_omni_moe_vision_encoder",
+      "no_repeat_ngram_size": 0,
+      "num_beam_groups": 1,
+      "num_beams": 1,
+      "num_heads": 16,
+      "num_return_sequences": 1,
+      "out_hidden_size": 2048,
+      "output_attentions": false,
+      "output_hidden_states": false,
+      "output_scores": false,
+      "pad_token_id": null,
+      "patch_size": 16,
+      "prefix": null,
+      "problem_type": null,
+      "pruned_heads": {},
+      "remove_invalid_values": false,
+      "repetition_penalty": 1.0,
+      "return_dict": true,
+      "return_dict_in_generate": false,
+      "sep_token_id": null,
+      "spatial_merge_size": 2,
+      "spatial_patch_size": 16,
+      "suppress_tokens": null,
+      "task_specific_params": null,
+      "temperature": 1.0,
+      "temporal_patch_size": 2,
+      "tf_legacy_loss": false,
+      "tie_encoder_decoder": false,
+      "tie_word_embeddings": true,
+      "tokenizer_class": null,
+      "tokens_per_second": 2,
+      "top_k": 50,
+      "top_p": 1.0,
+      "torchscript": false,
+      "typical_p": 1.0,
+      "use_bfloat16": false
+    },
+    "vision_end_token_id": 151653,
+    "vision_start_token_id": 151652
+  },
+  "transformers_version": "4.57.0.dev0",
+  "tts_bos_token_id": 151672,
+  "tts_eos_token_id": 151673,
+  "tts_pad_token_id": 151671,
+  "user_token_id": 872
+}

configuration.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"framework":"Pytorch","task":"any-to-any"}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "talker_max_new_tokens": 4096,
+  "talker_repetition_penalty": 1.05,
+  "talker_temperature": 0.9,
+  "talker_top_k": 50,
+  "talker_top_p": 1.0
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model-00001-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f95d142b727fbd0698f0ed9478e222d4f29467df855e21862e46bd1401049065
+size 4997899632

model-00002-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:320c1f31f7ddc4efd2501891590eca2c481ce4eca3db33d70ebbb637dae85764
+size 4997754216

model-00003-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:306f06b05ef032f1fa51eb62e72c3432324e7a084fa0d15ed17ff856adffcf33
+size 4997754216

model-00004-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4cb551f433bfb4d83a5843e433e74688d71a96eeb1b62217b37d719026d559b5
+size 4997755648

model-00005-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8aec5e8cc0e0b179a81f690af31a1149238f4b1d18ac60e3f690020fdf7d915f
+size 4997755792

model-00006-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8ff40a3d9e9c59a35a10257c1c5b216f472e6a102a4d6059bd31183eee03cb4c
+size 4997755792

model-00007-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6d84cc1d47178850f9ba1fe9c01f45ee63d9a8b9975f92821ee2b4a0387860b
+size 4997755792

model-00008-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9f369ccdc9d2d2a55df794005b23cdbce44494a72e4a0069a150ae471cfea720
+size 4997755792

model-00009-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:980f754b21c4b20f36e02d184b792f5625f7c9d24b8a5c9da94287cf3cfde11b
+size 4997755792

model-00010-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5c5a798376321d74df2eb3f235e1c8c68669f09b0c753b5698a16e681204d808
+size 4997755792

model-00011-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f7c051bd391c89279b43e7342ff4a243ad2ba2144f4aa62233fd2b02d72fce0
+size 4997755792

model-00012-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a8b09a9367cd31ebf928e87b9eaf002492379d9d21d352b480854c2fbe6d9978
+size 4997755792

model-00013-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9b7441c3a33e066d109cd6338e74ed680de9aa407d4df66eba5cb3c15c6d738
+size 4999771808

model-00014-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:73f09f187ff7b469a8937b39b8ac772e539e0d61311526c345558c5e6a6eb09a
+size 4996618552

model-00015-of-00015.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68880d566a50643807a4e73f5227c4f468bb31532925ed5fae78a51bdd39492a
+size 553698794

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "dither": 0.0,
+  "feature_extractor_type": "WhisperFeatureExtractor",
+  "feature_size": 128,
+  "hop_length": 160,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "Qwen2VLImageProcessor",
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "max_pixels": 12845056,
+  "merge_size": 2,
+  "min_pixels": 3136,
+  "n_fft": 400,
+  "n_samples": 4800000,
+  "nb_max_frames": 30000,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "patch_size": 16,
+  "processor_class": "Qwen3OmniMoeProcessor",
+  "return_attention_mask": true,
+  "sampling_rate": 16000,
+  "temporal_patch_size": 2
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,316 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151669": {
+      "content": "<|audio_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151670": {
+      "content": "<|audio_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151671": {
+      "content": "<tts_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151672": {
+      "content": "<tts_text_bos>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151673": {
+      "content": "<tts_text_eod>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151674": {
+      "content": "<tts_text_bos_single>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151675": {
+      "content": "<|audio_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>",
+    "<|audio_start|>",
+    "<|audio_end|>",
+    "<tts_pad>",
+    "<tts_text_bos>",
+    "<tts_text_bos_single>",
+    "<|audio_pad|>"
+  ],
+  "extra_special_tokens": {
+    "image_token": "<|image_pad|>",
+    "audio_token": "<|audio_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>",
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>"
+  },
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null,
+  "image_token": "<|image_pad|>",
+  "audio_token": "<|audio_pad|>",
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>",
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>"
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff