Qwen2.5-3B-SFT-UltraChat-GGUF
GGUF quantized versions of Qwen2.5-3B-SFT-UltraChat for efficient CPU and mixed CPU/GPU inference.
Available Quantizations
| Quantization | Size | Description |
|---|---|---|
| Q4_K_M | ~40% of original | Best balance of quality and size |
| Q5_K_M | ~50% of original | Higher quality, larger size |
| Q8_0 | ~80% of original | Highest quality quantization |
Quick Start
Using Ollama
# Pull and run directly from HuggingFace
ollama pull hf.co/ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF:Q4_K_M "Hello!"
Using llama.cpp
# Download the GGUF file
huggingface-cli download ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF \
qwen2.5_3b_sft_ultrachat-q4_k_m.gguf --local-dir ./models
# Run inference
./llama-cli -m ./models/qwen2.5_3b_sft_ultrachat-q4_k_m.gguf \
-p "What is the capital of France?" -n 128
Using LM Studio
- Download the GGUF file from this repository
- Import into LM Studio
- Select the model and start chatting
Source Model
This is a quantized version of Qwen2.5-3B-SFT-UltraChat, which was fine-tuned on the HuggingFaceH4/ultrachat_200k dataset using SFT.
See the source model card for full training details and usage examples.
License
CC-BY-NC-4.0 (same as source model)
Acknowledgments
- Quantization performed using llama.cpp
- Source model by ermiaazarkhalili
- Downloads last month
- -
Hardware compatibility
Log In
to view the estimation
4-bit
5-bit
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF
Base model
Qwen/Qwen2.5-3B