Error when running on AMD with Docker

#9
by wetchoc - opened

Has anyone successfully run this on AMD with Docker?

 docker run -it --rm \
    --group-add=video \
    --ipc=host \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -e HF_TOKEN="" \
    -e VLLM_DISABLE_COMPILE_CACHE=1 \
    -p 8000:8000 \
    vllm/vllm-openai-rocm:latest \
    mistralai/Voxtral-Mini-4B-Realtime-2602 \
    --compilation_config '{"cudagraph_mode": "PIECEWISE"}'
(EngineCore_DP0 pid=70) INFO 02-05 23:15:08 [gpu_model_runner.py:4033] Starting to load model mistralai/Voxtral-Mini-4B-Realtime-2602...

(EngineCore_DP0 pid=70) INFO 02-05 23:15:08 [vllm.py:624] Asynchronous scheduling is enabled.

(EngineCore_DP0 pid=70) INFO 02-05 23:15:08 [rocm.py:338] Using Triton Attention backend.

(EngineCore_DP0 pid=70) WARNING 02-05 23:15:08 [compilation.py:1078] Op 'sparse_attn_indexer' not present in model, enabling with '+sparse_attn_indexer' has no effect

(EngineCore_DP0 pid=70) INFO 02-05 23:15:08 [rocm.py:338] Using Triton Attention backend.

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] EngineCore failed to start.

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] Traceback (most recent call last):

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 937, in run_engine_core

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 691, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     super().__init__(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 105, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self.model_executor = executor_class(vllm_config)

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self._init_executor()

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self.driver_worker.load_model()

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self.model_runner.load_model(eep_scale_up=eep_scale_up)

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4052, in load_model

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self.model = model_loader.load_model(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                  ^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     model = initialize_model(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]             ^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     return model_class(vllm_config=vllm_config, prefix=prefix)

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/voxtral_streaming.py", line 136, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     super().__init__(vllm_config=vllm_config, prefix=prefix)

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/voxtral.py", line 381, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self.whisper_encoder = VoxtralEncoderModel(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                            ^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/voxtral.py", line 767, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self.whisper_encoder = WhisperEncoderCls(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                            ^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 433, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self.start_layer, self.end_layer, self.layers = make_layers(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                                                     ^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 707, in make_layers

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 435, in <lambda>

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     lambda prefix: WhisperCausalEncoderLayer(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 375, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self.self_attn = WhisperCausalAttention(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                      ^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 301, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     self.attn = WhisperCausalAttentionWithBlockPooling(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 222, in __init__

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     attn_backend = create_whisper_attention_backend_with_block_pooling(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 155, in create_whisper_attention_backend_with_block_pooling

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946]     raise NotImplementedError(

(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] NotImplementedError: <class 'vllm.v1.attention.backends.triton_attn.TritonAttentionBackend'> is not yet supported.Contributions to support more backends are much appreciated.

(EngineCore_DP0 pid=70) Process EngineCore_DP0:

(EngineCore_DP0 pid=70) Traceback (most recent call last):

(EngineCore_DP0 pid=70)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap

(EngineCore_DP0 pid=70)     self.run()

(EngineCore_DP0 pid=70)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run 

same, any solution?

Mistral AI_ org

Can you open an issue directly on the vLLM github repo?

Sign up or log in to comment