--- license: apache-2.0 tags: - gptq - qwen - text-generation - causal-lm - transformers --- # Qwen2.5-3B GPTQ This is the **Qwen2.5-3B** model quantized using **GPTQ** for efficient inference with `AutoGPTQ`. ### ๐Ÿ” Model Details: - Base: [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) - Quantization: GPTQ - Format: `safetensors` (safe and fast loading) - Purpose: Fast inference for chat and code tasks --- ## ๐Ÿงช Example Usage (Transformers) ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "STiFLeR7/Qwen2.5-3B-GPTQ" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", trust_remote_code=True) prompt = "Q: Who is Alan Turing?\nA:" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True))