Duplicate files

by darkstar3537 - opened 11 days ago

Discussion

darkstar3537

11 days ago

Are the consolidated safetensors files the ones needed to run it or the others?

mtcl

7 days ago

Yes that is confusing...

HenkTenk

7 days ago

•

edited 7 days ago

@cpatonn do you know the answer?

cpatonn

cyankiwi org 7 days ago

Yes, only consolidated safetensors are needed to run, but as the original mistral model also provides model safetensors files, I did similarly to them.

dr-e

7 days ago

I receive

devstral-1  | (APIServer pid=1)   Value error, Quantization method specified in the model config (compressed-tensors) does not match the quantization method specified in the `quantization` argument (awq). [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]

when deploying on vLLM.

vllm serve cyankiwi/Devstral-2-123B-Instruct-2512-AWQ-4bit --tool-call-parser mistral --enable-auto-tool-choice
      --tensor-parallel-size 2
      --gpu-memory-utilization 0.35
      --quantization awq --dtype half

AustinM731

3 days ago

@dr-e You can leave the quantization field blank and vLLM will automatically select the correct type. I ran into the same issue with another AWQ model that was created with llmcompressor. Trying to set compressed-tensors as the quantization type threw another error for me as well since whatever kernel it was trying to use is not supported on RDNA4.

Drop the --quantization awq argument and it should just work.

mtcl

3 days ago

does it work with PRO 6000 Blackwell?

qingy2024

2 days ago

Can we delete consolidated safetensors, or should we delete the model ones?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment