Qwen3-4B-Instruct-2507-GPTQ-Int4
Languages: Multilingual support (104 languages, including languages of the CIS countries).
Model Description
This is a quantized version of Qwen/Qwen3-4B-Instruct-2507.
The model was quantized using llmcompressor with GPTQ method (W4A16).
- Quantization: GPTQ 4-bit (weights), 16-bit (activations)
- Format: compressed-tensors (native vLLM support)
- Group Size: 128
- Act Order: Static
How to Run
vLLM (Recommended)
This format is optimized for vLLM.
vllm serve "superjob/Qwen3-4B-Instruct-2507-GPTQ-Int4" --quantization compressed-tensors --dtype bfloat16
Python (using vLLM)
from vllm import LLM, SamplingParams
llm = LLM(model="superjob/Qwen3-4B-Instruct-2507-GPTQ-Int4", quantization="compressed-tensors")
prompts = ["Hello, how are you?"]
sampling_params = SamplingParams(temperature=0.7, top_p=0.8)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
generated_text = output.outputs[0].text
print(f"Generated text: {generated_text!r}")
License
Apache 2.0 (Same as original model).
- Downloads last month
- 430
Model tree for superjob/Qwen3-4B-Instruct-2507-GPTQ-Int4
Base model
Qwen/Qwen3-4B-Instruct-2507