Duplicate files

#3
by darkstar3537 - opened

Are the consolidated safetensors files the ones needed to run it or the others?

Yes that is confusing...

@cpatonn do you know the answer?

cyankiwi org

Yes, only consolidated safetensors are needed to run, but as the original mistral model also provides model safetensors files, I did similarly to them.

I receive

devstral-1  | (APIServer pid=1)   Value error, Quantization method specified in the model config (compressed-tensors) does not match the quantization method specified in the `quantization` argument (awq). [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]

when deploying on vLLM.

vllm serve cyankiwi/Devstral-2-123B-Instruct-2512-AWQ-4bit --tool-call-parser mistral --enable-auto-tool-choice
      --tensor-parallel-size 2
      --gpu-memory-utilization 0.35
      --quantization awq --dtype half

@dr-e You can leave the quantization field blank and vLLM will automatically select the correct type. I ran into the same issue with another AWQ model that was created with llmcompressor. Trying to set compressed-tensors as the quantization type threw another error for me as well since whatever kernel it was trying to use is not supported on RDNA4.

Drop the --quantization awq argument and it should just work.

does it work with PRO 6000 Blackwell?

Can we delete consolidated safetensors, or should we delete the model ones?

Sign up or log in to comment