do not need config.json ?

#18
by Xiakj - opened

docker run --rm
--gpus all
--network host
-v /home/xiakj/huggingface/mistralai/Voxtral-Mini-4B-Realtime-2602:/models/Voxtral-Mini-4B-Realtime-2602
vllm-voxtral:latest
/models/Voxtral-Mini-4B-Realtime-2602
--host 0.0.0.0
--port 9009
--compilation_config '{"cudagraph_mode": "PIECEWISE"}'
(APIServer pid=1) INFO 02-12 01:18:18 [utils.py:314]
(APIServer pid=1) INFO 02-12 01:18:18 [utils.py:314] β–ˆ β–ˆ β–ˆβ–„ β–„β–ˆ
(APIServer pid=1) INFO 02-12 01:18:18 [utils.py:314] β–„β–„ β–„β–ˆ β–ˆ β–ˆ β–ˆ β–€β–„β–€ β–ˆ version 0.16.0rc1.dev78+gb6bb2842c
(APIServer pid=1) INFO 02-12 01:18:18 [utils.py:314] β–ˆβ–„β–ˆβ–€ β–ˆ β–ˆ β–ˆ β–ˆ model /models/Voxtral-Mini-4B-Realtime-2602
(APIServer pid=1) INFO 02-12 01:18:18 [utils.py:314] β–€β–€ β–€β–€β–€β–€β–€ β–€β–€β–€β–€β–€ β–€ β–€
(APIServer pid=1) INFO 02-12 01:18:18 [utils.py:314]
(APIServer pid=1) INFO 02-12 01:18:18 [utils.py:250] non-default args: {'model_tag': '/models/Voxtral-Mini-4B-Realtime-2602', 'api_server_count': 1, 'host': '0.0.0.0', 'port': 9009, 'model': '/models/Voxtral-Mini-4B-Realtime-2602', 'compilation_config': {'level': None, 'mode': None, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': [], 'splitting_ops': None, 'compile_mm_encoder': False, 'compile_sizes': None, 'compile_ranges_split_points': None, 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.PIECEWISE: 1>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': None, 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': None, 'pass_config': {}, 'max_cudagraph_capture_size': None, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'static_all_moe_layers': []}}
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] The params.json file is missing 'max_position_embeddings' and could not get a value from the HF config. Defaulting to 128000
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] Traceback (most recent call last):
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 1149, in _maybe_retrieve_max_pos_from_hf
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] hf_config = get_config(
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] ^^^^^^^^^^^
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 634, in get_config
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] config_dict, config = config_parser.parse(
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 187, in parse
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] raise e
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 166, in parse
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] config = AutoConfig.from_pretrained(
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1380, in from_pretrained
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] raise ValueError(
(APIServer pid=1) WARNING 02-12 01:18:18 [config.py:1158] ValueError: Unrecognized model in /models/Voxtral-Mini-4B-Realtime-2602. Should have a model_type key in its config.json, or contain one of the following strings in its name: aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, blt, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, d_fine, dab-detr, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deepseek_v2, deepseek_v3, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, deta, detr, dia, diffllama, dinat, dinov2, dinov2_with_registers, dinov3_convnext, dinov3_vit, distilbert, doge, donut-swin, dots1, dpr, dpt, edgetam, edgetam_video, edgetam_vision_model, efficientformer, efficientloftr, efficientnet, electra, emu3, encodec, encoder-decoder, eomt, ernie, ernie4_5, ernie4_5_moe, ernie_m, esm, evolla, exaone4, falcon, falcon_h1, falcon_mamba, fastspeech2_conformer, fastspeech2_conformer_with_hifigan, flaubert, flava, flex_olmo, florence2, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, gemma3, gemma3_text, gemma3n, gemma3n_audio, gemma3n_text, gemma3n_vision, git, glm, glm4, glm4_moe, glm4v, glm4v_moe, glm4v_moe_text, glm4v_text, glpn, got_ocr2, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gpt_oss, gptj, gptsan-japanese, granite, granite_speech, granitemoe, granitemoehybrid, granitemoeshared, granitevision, graphormer, grounding-dino, groupvit, helium, hgnet_v2, hiera, hubert, hunyuan_v1_dense, hunyuan_v1_moe, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, internvl, internvl_vision, jamba, janus, jetmoe, jukebox, kosmos-2, kosmos-2.5, kyutai_speech_to_text, layoutlm, layoutlmv2, layoutlmv3, led, levit, lfm2, lfm2_vl, lightglue, lilt, llama, llama4, llama4_text, llava, llava_next, llava_next_video, llava_onevision, longcat_flash, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, metaclip_2, mgp-str, mimi, minimax, ministral, mistral, mistral3, mixtral, mlcd, mllama, mm-grounding-dino, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, modernbert, modernbert-decoder, moonshine, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmo3, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, ovis2, owlv2, owlvit, paligemma, parakeet_ctc, parakeet_encoder, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, perception_encoder, perception_lm, persimmon, phi, phi3, phi4_multimodal, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prompt_depth_anything, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_5_omni, qwen2_5_vl, qwen2_5_vl_text, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, qwen2_vl_text, qwen3, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl, qwen3_vl_moe, qwen3_vl_moe_text, qwen3_vl_text, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rt_detr_v2, rwkv, sam, sam2, sam2_hiera_det_model, sam2_video, sam2_vision_model, sam_hq, sam_hq_vision_model, sam_vision_model, seamless_m4t, seamless_m4t_v2, seed_oss, segformer, seggpt, sew, sew-d, shieldgemma2, siglip, siglip2, siglip2_vision_model, siglip_vision_model, smollm3, smolvlm, smolvlm_vision, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, t5gemma, table-transformer, tapas, textnet, time_series_transformer, timesfm, timesformer, timm_backbone, timm_wrapper, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, vaultgemma, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vitpose, vitpose_backbone, vits, vivit, vjepa2, voxtral, voxtral_encoder, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xcodec, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xlstm, xmod, yolos, yoso, zamba, zamba2, zoedepth
(APIServer pid=1) INFO 02-12 01:18:22 [model.py:532] Resolved architecture: VoxtralRealtimeGeneration
(APIServer pid=1) ERROR 02-12 01:18:22 [repo_utils.py:47] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/models/Voxtral-Mini-4B-Realtime-2602'. Use repo_type argument if needed., retrying 1 of 2
(APIServer pid=1) ERROR 02-12 01:18:24 [repo_utils.py:45] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/models/Voxtral-Mini-4B-Realtime-2602'. Use repo_type argument if needed.
(APIServer pid=1) INFO 02-12 01:18:24 [model.py:1865] Downcasting torch.float32 to torch.bfloat16.
(APIServer pid=1) INFO 02-12 01:18:24 [model.py:1543] Using max model len 131072
(APIServer pid=1) INFO 02-12 01:18:24 [scheduler.py:226] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=1) INFO 02-12 01:18:24 [vllm.py:669] Asynchronous scheduling is enabled.
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] The params.json file is missing 'max_position_embeddings' and could not get a value from the HF config. Defaulting to 128000
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] Traceback (most recent call last):
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 1021, in try_get_generation_config
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] return GenerationConfig.from_pretrained(
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/transformers/generation/configuration_utils.py", line 903, in from_pretrained
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] resolved_config_file = cached_file(
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] ^^^^^^^^^^^^
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 322, in cached_file
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 437, in cached_files
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] raise OSError(
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] OSError: /models/Voxtral-Mini-4B-Realtime-2602 does not appear to have a file named generation_config.json. Checkout 'https://huggingface.co//models/Voxtral-Mini-4B-Realtime-2602/tree/main' for available files.
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158]
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] During handling of the above exception, another exception occurred:
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158]
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] Traceback (most recent call last):
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 1149, in _maybe_retrieve_max_pos_from_hf
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] hf_config = get_config(
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] ^^^^^^^^^^^
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 634, in get_config
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] config_dict, config = config_parser.parse(
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 187, in parse
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] raise e
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 166, in parse
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] config = AutoConfig.from_pretrained(
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) WARNING 02-12 01:18:24 [config.py:1158] File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1380, in from_pretrained

Suggest checking: vLLM version.

Sign up or log in to comment