DeepSeek-V3.2 MXFP4 GGUF
This is an MXFP4 quantized version of DeepSeek-V3.2, optimized for Apple Silicon and other hardware with MXFP4 support.
Model Details
- Base Model: DeepSeek-V3.2-Exp-BF16
- Quantization: MXFP4_MOE (4-bit mixed precision for MoE layers, Q8_0 for standard layers)
- Original Size: 1.2 TB (BF16)
- Quantized Size: 341 GB (72% reduction)
- Quantization Tool: llama.cpp (December 2024 build)
- Quantization Time: ~10 minutes on Mac Studio M3 Ultra
Performance
- Hardware: Optimized for Apple Silicon (Metal), NVIDIA GPUs (CUDA), and other accelerators
- Expected Speedup: 2-3x faster inference vs Q4_K_M with hardware acceleration
- Context Length: 128K tokens
File Structure
This model is split into 18 chunks due to HuggingFace file size limits:
DeepSeek-V3.2-MXFP4-chunk-001-of-018.gguf
DeepSeek-V3.2-MXFP4-chunk-002-of-018.gguf
...
DeepSeek-V3.2-MXFP4-chunk-018-of-018.gguf
Reassembly Instructions
macOS/Linux
cat DeepSeek-V3.2-MXFP4-chunk-*.gguf > DeepSeek-V3.2-MXFP4.gguf
Windows (PowerShell)
Get-Content DeepSeek-V3.2-MXFP4-chunk-*.gguf -Raw -Encoding Byte | Set-Content DeepSeek-V3.2-MXFP4.gguf -Encoding Byte
Python
import glob
output_file = "DeepSeek-V3.2-MXFP4.gguf"
chunk_files = sorted(glob.glob("DeepSeek-V3.2-MXFP4-chunk-*.gguf"))
with open(output_file, 'wb') as outfile:
for chunk_file in chunk_files:
print(f"Merging {chunk_file}...")
with open(chunk_file, 'rb') as infile:
outfile.write(infile.read())
print(f"Reassembly complete: {output_file}")
Usage with Ollama
After reassembling the model:
# Create Modelfile
cat > Modelfile.deepseek << 'EOF'
FROM ./DeepSeek-V3.2-MXFP4.gguf
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range $i, $_ := .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER num_ctx 128000
PARAMETER temperature 0.7
EOF
# Build model
ollama create deepseek-v3.2:mxfp4 -f Modelfile.deepseek
# Test
ollama run deepseek-v3.2:mxfp4 "Explain quantum computing"
Quantization Details
- MoE Expert Layers: MXFP4_MOE (4-bit mixed precision)
- Standard Layers: Q8_0 (8-bit)
- Final Model Size: 349,286 MB (341 GB)
- Quantization Script: Built with llama.cpp Metal support on Mac Studio
System Requirements
- Disk Space: 341 GB
- RAM: Minimum 64 GB (128+ GB recommended for full context)
- Recommended Hardware:
- Apple Silicon (M1/M2/M3 series with Metal)
- NVIDIA GPU with CUDA support
- Other accelerators with MXFP4 support
Credits
- Original Model: DeepSeek AI
- Quantization: steve.aitken@intelligentplant.com
- Tooling: llama.cpp
License
Apache 2.0 (inherits from original DeepSeek-V3.2 license)
- Downloads last month
- -
Hardware compatibility
Log In
to view the estimation
We're not able to determine the quantization variants.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support