DeepSeek-V3.2 MXFP4 GGUF

This is an MXFP4 quantized version of DeepSeek-V3.2, optimized for Apple Silicon and other hardware with MXFP4 support.

Model Details

Base Model: DeepSeek-V3.2-Exp-BF16
Quantization: MXFP4_MOE (4-bit mixed precision for MoE layers, Q8_0 for standard layers)
Original Size: 1.2 TB (BF16)
Quantized Size: 341 GB (72% reduction)
Quantization Tool: llama.cpp (December 2024 build)
Quantization Time: ~10 minutes on Mac Studio M3 Ultra

Performance

Hardware: Optimized for Apple Silicon (Metal), NVIDIA GPUs (CUDA), and other accelerators
Expected Speedup: 2-3x faster inference vs Q4_K_M with hardware acceleration
Context Length: 128K tokens

File Structure

This model is split into 18 chunks due to HuggingFace file size limits:

DeepSeek-V3.2-MXFP4-chunk-001-of-018.gguf
DeepSeek-V3.2-MXFP4-chunk-002-of-018.gguf
...
DeepSeek-V3.2-MXFP4-chunk-018-of-018.gguf

Reassembly Instructions

macOS/Linux

cat DeepSeek-V3.2-MXFP4-chunk-*.gguf > DeepSeek-V3.2-MXFP4.gguf

Windows (PowerShell)

Get-Content DeepSeek-V3.2-MXFP4-chunk-*.gguf -Raw -Encoding Byte | Set-Content DeepSeek-V3.2-MXFP4.gguf -Encoding Byte

Python

import glob

output_file = "DeepSeek-V3.2-MXFP4.gguf"
chunk_files = sorted(glob.glob("DeepSeek-V3.2-MXFP4-chunk-*.gguf"))

with open(output_file, 'wb') as outfile:
    for chunk_file in chunk_files:
        print(f"Merging {chunk_file}...")
        with open(chunk_file, 'rb') as infile:
            outfile.write(infile.read())

print(f"Reassembly complete: {output_file}")

Usage with Ollama

After reassembling the model:

# Create Modelfile
cat > Modelfile.deepseek << 'EOF'
FROM ./DeepSeek-V3.2-MXFP4.gguf

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range $i, $_ := .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER num_ctx 128000
PARAMETER temperature 0.7
EOF

# Build model
ollama create deepseek-v3.2:mxfp4 -f Modelfile.deepseek

# Test
ollama run deepseek-v3.2:mxfp4 "Explain quantum computing"

Quantization Details

MoE Expert Layers: MXFP4_MOE (4-bit mixed precision)
Standard Layers: Q8_0 (8-bit)
Final Model Size: 349,286 MB (341 GB)
Quantization Script: Built with llama.cpp Metal support on Mac Studio

System Requirements

Disk Space: 341 GB
RAM: Minimum 64 GB (128+ GB recommended for full context)
Recommended Hardware:
- Apple Silicon (M1/M2/M3 series with Metal)
- NVIDIA GPU with CUDA support
- Other accelerators with MXFP4 support

Credits

Original Model: DeepSeek AI
Quantization: steve.aitken@intelligentplant.com
Tooling: llama.cpp

License

Apache 2.0 (inherits from original DeepSeek-V3.2 license)

Downloads last month: -

GGUF

Model size

672B params

Architecture

deepseek3_2

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support