Qwen3-4B-Instruct-2507-GPTQ-Int4

Languages: Multilingual support (104 languages, including languages of the CIS countries).

Model Description

This is a quantized version of Qwen/Qwen3-4B-Instruct-2507. The model was quantized using llmcompressor with GPTQ method (W4A16).

Quantization: GPTQ 4-bit (weights), 16-bit (activations)
Format: compressed-tensors (native vLLM support)
Group Size: 128
Act Order: Static

How to Run

vLLM (Recommended)

This format is optimized for vLLM.

vllm serve "superjob/Qwen3-4B-Instruct-2507-GPTQ-Int4" --quantization compressed-tensors --dtype bfloat16

Python (using vLLM)

from vllm import LLM, SamplingParams

llm = LLM(model="superjob/Qwen3-4B-Instruct-2507-GPTQ-Int4", quantization="compressed-tensors")
prompts = ["Hello, how are you?"]
sampling_params = SamplingParams(temperature=0.7, top_p=0.8)

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    generated_text = output.outputs[0].text
    print(f"Generated text: {generated_text!r}")

License

Apache 2.0 (Same as original model).

Downloads last month: 430

Safetensors

Model size

1B params

Tensor type

I64

I32

BF16

Model tree for superjob/Qwen3-4B-Instruct-2507-GPTQ-Int4

Base model

Qwen/Qwen3-4B-Instruct-2507

Quantized

(175)

this model