Qwen2.5-3B-SFT-UltraChat-GGUF

GGUF quantized versions of Qwen2.5-3B-SFT-UltraChat for efficient CPU and mixed CPU/GPU inference.

Available Quantizations

Quantization	Size	Description
Q4_K_M	~40% of original	Best balance of quality and size
Q5_K_M	~50% of original	Higher quality, larger size
Q8_0	~80% of original	Highest quality quantization

Quick Start

Using Ollama

# Pull and run directly from HuggingFace
ollama pull hf.co/ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF:Q4_K_M "Hello!"

Using llama.cpp

# Download the GGUF file
huggingface-cli download ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF \
    qwen2.5_3b_sft_ultrachat-q4_k_m.gguf --local-dir ./models

# Run inference
./llama-cli -m ./models/qwen2.5_3b_sft_ultrachat-q4_k_m.gguf \
    -p "What is the capital of France?" -n 128

Using LM Studio

Download the GGUF file from this repository
Import into LM Studio
Select the model and start chatting

Source Model

This is a quantized version of Qwen2.5-3B-SFT-UltraChat, which was fine-tuned on the HuggingFaceH4/ultrachat_200k dataset using SFT.

See the source model card for full training details and usage examples.

License

CC-BY-NC-4.0 (same as source model)

Acknowledgments

Quantization performed using llama.cpp
Source model by ermiaazarkhalili

Downloads last month: -

GGUF

Model size

3B params

Architecture

qwen2

Hardware compatibility

4-bit

5-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF

Base model

Qwen/Qwen2.5-3B

Adapter

ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat

Quantized

(1)

this model