Qwen3-4B-Instruct-2507-GPTQ-Int4

Languages: Multilingual support (104 languages, including languages ​​of the CIS countries).

Model Description

This is a quantized version of Qwen/Qwen3-4B-Instruct-2507. The model was quantized using llmcompressor with GPTQ method (W4A16).

  • Quantization: GPTQ 4-bit (weights), 16-bit (activations)
  • Format: compressed-tensors (native vLLM support)
  • Group Size: 128
  • Act Order: Static

How to Run

vLLM (Recommended)

This format is optimized for vLLM.

vllm serve "superjob/Qwen3-4B-Instruct-2507-GPTQ-Int4" --quantization compressed-tensors --dtype bfloat16

Python (using vLLM)

from vllm import LLM, SamplingParams

llm = LLM(model="superjob/Qwen3-4B-Instruct-2507-GPTQ-Int4", quantization="compressed-tensors")
prompts = ["Hello, how are you?"]
sampling_params = SamplingParams(temperature=0.7, top_p=0.8)

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    generated_text = output.outputs[0].text
    print(f"Generated text: {generated_text!r}")

License

Apache 2.0 (Same as original model).


Downloads last month
430
Safetensors
Model size
1B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for superjob/Qwen3-4B-Instruct-2507-GPTQ-Int4

Quantized
(175)
this model