|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- gptq |
|
|
- qwen |
|
|
- text-generation |
|
|
- causal-lm |
|
|
- transformers |
|
|
--- |
|
|
|
|
|
# Qwen2.5-3B GPTQ |
|
|
|
|
|
This is the **Qwen2.5-3B** model quantized using **GPTQ** for efficient inference with `AutoGPTQ`. |
|
|
|
|
|
### 🔍 Model Details: |
|
|
- Base: [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) |
|
|
- Quantization: GPTQ |
|
|
- Format: `safetensors` (safe and fast loading) |
|
|
- Purpose: Fast inference for chat and code tasks |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧪 Example Usage (Transformers) |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
model_id = "STiFLeR7/Qwen2.5-3B-GPTQ" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", trust_remote_code=True) |
|
|
|
|
|
prompt = "Q: Who is Alan Turing?\nA:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|