Error when running on AMD with Docker
#9
by
wetchoc - opened
Has anyone successfully run this on AMD with Docker?
docker run -it --rm \
--group-add=video \
--ipc=host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e HF_TOKEN="" \
-e VLLM_DISABLE_COMPILE_CACHE=1 \
-p 8000:8000 \
vllm/vllm-openai-rocm:latest \
mistralai/Voxtral-Mini-4B-Realtime-2602 \
--compilation_config '{"cudagraph_mode": "PIECEWISE"}'
(EngineCore_DP0 pid=70) INFO 02-05 23:15:08 [gpu_model_runner.py:4033] Starting to load model mistralai/Voxtral-Mini-4B-Realtime-2602...
(EngineCore_DP0 pid=70) INFO 02-05 23:15:08 [vllm.py:624] Asynchronous scheduling is enabled.
(EngineCore_DP0 pid=70) INFO 02-05 23:15:08 [rocm.py:338] Using Triton Attention backend.
(EngineCore_DP0 pid=70) WARNING 02-05 23:15:08 [compilation.py:1078] Op 'sparse_attn_indexer' not present in model, enabling with '+sparse_attn_indexer' has no effect
(EngineCore_DP0 pid=70) INFO 02-05 23:15:08 [rocm.py:338] Using Triton Attention backend.
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] super().__init__(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self._init_executor()
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self.driver_worker.load_model()
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4052, in load_model
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self.model = model_loader.load_model(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] model = initialize_model(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/voxtral_streaming.py", line 136, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] super().__init__(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/voxtral.py", line 381, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self.whisper_encoder = VoxtralEncoderModel(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/voxtral.py", line 767, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self.whisper_encoder = WhisperEncoderCls(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 433, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self.start_layer, self.end_layer, self.layers = make_layers(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 707, in make_layers
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 435, in <lambda>
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] lambda prefix: WhisperCausalEncoderLayer(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 375, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self.self_attn = WhisperCausalAttention(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 301, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] self.attn = WhisperCausalAttentionWithBlockPooling(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 222, in __init__
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] attn_backend = create_whisper_attention_backend_with_block_pooling(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper_causal.py", line 155, in create_whisper_attention_backend_with_block_pooling
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] raise NotImplementedError(
(EngineCore_DP0 pid=70) ERROR 02-05 23:15:09 [core.py:946] NotImplementedError: <class 'vllm.v1.attention.backends.triton_attn.TritonAttentionBackend'> is not yet supported.Contributions to support more backends are much appreciated.
(EngineCore_DP0 pid=70) Process EngineCore_DP0:
(EngineCore_DP0 pid=70) Traceback (most recent call last):
(EngineCore_DP0 pid=70) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=70) self.run()
(EngineCore_DP0 pid=70) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
same, any solution?
Can you open an issue directly on the vLLM github repo?