AssertionError: rope can only used in combination with a sliding window

#15
by andrews-llms - opened

Hi,
I followed the installation instructions but when trying to start the server I get this error:

"""
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in init
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] super().init(
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 105, in init
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 101, in init
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] self._init_executor()
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] self.driver_worker.load_model()
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4052, in load_model
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] self.model = model_loader.load_model(
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] model = initialize_model(
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/voxtral_streaming.py", line 136, in init
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] super().init(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/voxtral.py", line 381, in init
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] self.whisper_encoder = VoxtralEncoderModel(
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/voxtral.py", line 767, in init
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] self.whisper_encoder = WhisperEncoderCls(
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/whisper_causal.py", line 433, in init
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] self.start_layer, self.end_layer, self.layers = make_layers(
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 707, in make_layers
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/whisper_causal.py", line 435, in
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] lambda prefix: WhisperCausalEncoderLayer(
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/whisper_causal.py", line 375, in init
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] self.self_attn = WhisperCausalAttention(
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/whisper_causal.py", line 314, in init
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] assert per_layer_sliding_window is not None, (
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) ERROR 02-08 23:36:43 [core.py:946] AssertionError: rope can only used in combination with a sliding window
(EngineCore_DP0 pid=237317) Process EngineCore_DP0:
(EngineCore_DP0 pid=237317) Traceback (most recent call last):
(EngineCore_DP0 pid=237317) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=237317) self.run()
(EngineCore_DP0 pid=237317) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=237317) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 950, in run_engine_core
(EngineCore_DP0 pid=237317) raise e
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=237317) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in init
(EngineCore_DP0 pid=237317) super().init(
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 105, in init
(EngineCore_DP0 pid=237317) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 101, in init
(EngineCore_DP0 pid=237317) self._init_executor()
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=237317) self.driver_worker.load_model()
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model
(EngineCore_DP0 pid=237317) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4052, in load_model
(EngineCore_DP0 pid=237317) self.model = model_loader.load_model(
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=237317) model = initialize_model(
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model
(EngineCore_DP0 pid=237317) return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/voxtral_streaming.py", line 136, in init
(EngineCore_DP0 pid=237317) super().init(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/voxtral.py", line 381, in init
(EngineCore_DP0 pid=237317) self.whisper_encoder = VoxtralEncoderModel(
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/voxtral.py", line 767, in init
(EngineCore_DP0 pid=237317) self.whisper_encoder = WhisperEncoderCls(
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/whisper_causal.py", line 433, in init
(EngineCore_DP0 pid=237317) self.start_layer, self.end_layer, self.layers = make_layers(
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 707, in make_layers
(EngineCore_DP0 pid=237317) maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/whisper_causal.py", line 435, in
(EngineCore_DP0 pid=237317) lambda prefix: WhisperCausalEncoderLayer(
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/whisper_causal.py", line 375, in init
(EngineCore_DP0 pid=237317) self.self_attn = WhisperCausalAttention(
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/model_executor/models/whisper_causal.py", line 314, in init
(EngineCore_DP0 pid=237317) assert per_layer_sliding_window is not None, (
(EngineCore_DP0 pid=237317) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=237317) AssertionError: rope can only used in combination with a sliding window
[rank0]:[W208 23:36:44.444766787 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=237114) Traceback (most recent call last):
(APIServer pid=237114) File "/home/rrr/dev/.venv/bin/vllm", line 10, in
(APIServer pid=237114) sys.exit(main())
(APIServer pid=237114) ^^^^^^
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=237114) args.dispatch_function(args)
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 111, in cmd
(APIServer pid=237114) uvloop.run(run_server(args))
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/uvloop/init.py", line 96, in run
(APIServer pid=237114) return __asyncio.run(
(APIServer pid=237114) ^^^^^^^^^^^^^^
(APIServer pid=237114) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=237114) return runner.run(main)
(APIServer pid=237114) ^^^^^^^^^^^^^^^^
(APIServer pid=237114) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=237114) return self._loop.run_until_complete(task)
(APIServer pid=237114) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=237114) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=237114) return await main
(APIServer pid=237114) ^^^^^^^^^^
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 919, in run_server
(APIServer pid=237114) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 938, in run_server_worker
(APIServer pid=237114) async with build_async_engine_client(
(APIServer pid=237114) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=237114) return await anext(self.gen)
(APIServer pid=237114) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 147, in build_async_engine_client
(APIServer pid=237114) async with build_async_engine_client_from_engine_args(
(APIServer pid=237114) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=237114) return await anext(self.gen)
(APIServer pid=237114) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 188, in build_async_engine_client_from_engine_args
(APIServer pid=237114) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=237114) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 228, in from_vllm_config
(APIServer pid=237114) return cls(
(APIServer pid=237114) ^^^^
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 155, in init
(APIServer pid=237114) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=237114) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 122, in make_async_mp_client
(APIServer pid=237114) return AsyncMPClient(*client_args)
(APIServer pid=237114) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 819, in init
(APIServer pid=237114) super().init(
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 479, in init
(APIServer pid=237114) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=237114) File "/usr/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=237114) next(self.gen)
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 933, in launch_core_engines
(APIServer pid=237114) wait_for_engine_startup(
(APIServer pid=237114) File "/home/rrr/dev/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 992, in wait_for_engine_startup
(APIServer pid=237114) raise RuntimeError(
(APIServer pid=237114) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
"""

I am on Ubuntu 24, with an RTX 5090, driver version 590.48.01, and cuda Version 13.0, VLLM_VERSION 0.15.1, TORCH_VERSION 2.9.1+cu130, and mistral_common = 1.9.0.

sameeeeee :(

I have the same issue, trying to run in Docker image

Fixed it. Really, really make sure you're pulling the nightly version of vllm. Here's what my current setup looks like rn -

Base image: nvidia/cuda:12.9.0-devel-ubuntu22.04
uv pip install vllm[audio] --torch-backend=cu129 --extra_index_url https://wheels.vllm.ai/nightly/cu129
uv pip install mistral_common[audio] soxr librosa soundfile

Note: The commands above might have some redundancies but it does the job.

@xjyk Yes, the nightly version is working πŸ˜€
My solution was:
Create a Dockefile
Dockerfile:

FROM vllm/vllm-openai:nightly
RUN uv pip install --system mistral-common[soundfile]
ENV VLLM_DISABLE_COMPILE_CACHE=1

Build a local image from nightly to fix some missing requirements:
docker build -t vllm-voxtral:latest .

Then run a local image with:
docker run --gpus all -p 8000:8000 vllm-voxtral:latest mistralai/Voxtral-Mini-4B-Realtime-2602 --compilation_config '{"cudagraph_mode": "PIECEWISE"}'

This is nightly build that I used :
https://hub.docker.com/layers/vllm/vllm-openai/nightly/images/sha256-bae2fdec45747ec14d9d759ab2ce4a1be33ae076243d4cfc5e661db9d99e577a
DIGEST
sha256:4406bb6b7b972dd9c05f5b5706fe9771c76ecbb53e4c80eda5ddaf31afe0b530
IMAGE ID
c4b49fa03a77

>>> nvidia-smi

NVIDIA-SMI 575.51.03              
Driver Version: 575.51.03      
CUDA Version: 12.9

same problem, even with nightly.

Yeah same issue with nightly anyone have fix or workaround yet?

Sign up or log in to comment