Qwen2.5-3B-SFT-UltraChat-GGUF

GGUF quantized versions of Qwen2.5-3B-SFT-UltraChat for efficient CPU and mixed CPU/GPU inference.

Available Quantizations

Quantization Size Description
Q4_K_M ~40% of original Best balance of quality and size
Q5_K_M ~50% of original Higher quality, larger size
Q8_0 ~80% of original Highest quality quantization

Quick Start

Using Ollama

# Pull and run directly from HuggingFace
ollama pull hf.co/ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF:Q4_K_M "Hello!"

Using llama.cpp

# Download the GGUF file
huggingface-cli download ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF \
    qwen2.5_3b_sft_ultrachat-q4_k_m.gguf --local-dir ./models

# Run inference
./llama-cli -m ./models/qwen2.5_3b_sft_ultrachat-q4_k_m.gguf \
    -p "What is the capital of France?" -n 128

Using LM Studio

  1. Download the GGUF file from this repository
  2. Import into LM Studio
  3. Select the model and start chatting

Source Model

This is a quantized version of Qwen2.5-3B-SFT-UltraChat, which was fine-tuned on the HuggingFaceH4/ultrachat_200k dataset using SFT.

See the source model card for full training details and usage examples.

License

CC-BY-NC-4.0 (same as source model)

Acknowledgments

Downloads last month
-
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ermiaazarkhalili/Qwen2.5-3B-SFT-UltraChat-GGUF

Base model

Qwen/Qwen2.5-3B
Quantized
(1)
this model