🧠 Quem-v2-4b GGUFs

Quantized version of: rodrigomt/quem-V2-4b


πŸ“¦ Available GGUFs

Format Description
F16 Full precision (16-bit), better quality, larger size βš–οΈ
Q8_K_XL Quantized (8-bit XL variant, based on the quantization table of the unsloth model Qwen3-4B-Thinking-2507), smaller size, faster inference ⚑
Q4_K_XL Quantized (4-bit XL variant, based on the quantization table of the unsloth model Qwen3-4B-Thinking-2507), smaller size, faster inference ⚑

πŸš€ Usage

Example with llama.cpp:

./main -m ./gguf-file-name.gguf -p "Hello world!"
Downloads last month
14
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rodrigomt/quem-V2-4b-GGUF

Quantized
(1)
this model