STiFLeR7
/

Qwen2.5-3B-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

Qwen2.5-3B-GPTQ / readme.md

STiFLeR7's picture

Create readme.md

e0dc599 verified 8 months ago

|

history blame contribute delete

977 Bytes

	---
	license: apache-2.0
	tags:
	- gptq
	- qwen
	- text-generation
	- causal-lm
	- transformers
	---

	# Qwen2.5-3B GPTQ

	This is the Qwen2.5-3B model quantized using GPTQ for efficient inference with `AutoGPTQ`.

	### 🔍 Model Details:
	- Base: [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B)
	- Quantization: GPTQ
	- Format: `safetensors` (safe and fast loading)
	- Purpose: Fast inference for chat and code tasks

	---

	## 🧪 Example Usage (Transformers)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "STiFLeR7/Qwen2.5-3B-GPTQ"

	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", trust_remote_code=True)

	prompt = "Q: Who is Alan Turing?\nA:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))