Olmo-3.1-32B-Think-MLX-4bit
This is a 4-bit quantized MLX version of Olmo-3.1-32B-Think for Apple Silicon Macs.
Model Details
| Property | Value |
|---|---|
| Original Model | allenai/Olmo-3.1-32B-Think |
| Parameters | 32B |
| Quantization | 4-bit (group size 64) |
| Size | ~17 GB |
| Framework | MLX |
| Context Length | 32K tokens |
Features
- Extended reasoning: Chain-of-thought with
<think>tags - Strong on math/coding: Trained with RLVR for verifiable reasoning
- Open weights: Apache 2.0 license from Allen AI
Requirements
- macOS 15.0+ (Sequoia)
- Apple Silicon Mac (M1/M2/M3/M4)
- 24GB+ unified memory recommended (uses ~18GB peak)
- Python 3.10+
Installation
pip install mlx-lm
Usage
from mlx_lm import load, generate
# Load model
model, tokenizer = load("hunterbown/Olmo-3.1-32B-Think-MLX-4bit")
# Generate with reasoning
prompt = "If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?"
output = generate(model, tokenizer, prompt=prompt, max_tokens=500)
print(output)
Performance
Benchmarked on Mac Studio M4 with 36GB unified memory:
- Generation speed: ~19 tokens/sec
- Peak memory: ~18 GB
- Prompt processing: ~5 tokens/sec
Conversion Details
Converted using mlx-lm:
mlx_lm.convert \
--hf-path allenai/Olmo-3.1-32B-Think \
--mlx-path ./olmo-3.1-32b-think-4bit \
-q --q-bits 4 --q-group-size 64
License
Apache 2.0 (same as original model)
Credits
- Original model by Allen Institute for AI (Ai2)
- MLX conversion using mlx-lm
- Quantized by @hunterbown
- Downloads last month
- 43
Model tree for hunterbown/Olmo-3.1-32B-Think-MLX-4bit
Base model
allenai/Olmo-3-1125-32B
Finetuned
allenai/Olmo-3-32B-Think-SFT
Finetuned
allenai/Olmo-3-32B-Think-DPO
Finetuned
allenai/Olmo-3.1-32B-Think