Olmo-3.1-32B-Think-MLX-4bit

This is a 4-bit quantized MLX version of Olmo-3.1-32B-Think for Apple Silicon Macs.

Model Details

Property Value
Original Model allenai/Olmo-3.1-32B-Think
Parameters 32B
Quantization 4-bit (group size 64)
Size ~17 GB
Framework MLX
Context Length 32K tokens

Features

  • Extended reasoning: Chain-of-thought with <think> tags
  • Strong on math/coding: Trained with RLVR for verifiable reasoning
  • Open weights: Apache 2.0 license from Allen AI

Requirements

  • macOS 15.0+ (Sequoia)
  • Apple Silicon Mac (M1/M2/M3/M4)
  • 24GB+ unified memory recommended (uses ~18GB peak)
  • Python 3.10+

Installation

pip install mlx-lm

Usage

from mlx_lm import load, generate

# Load model
model, tokenizer = load("hunterbown/Olmo-3.1-32B-Think-MLX-4bit")

# Generate with reasoning
prompt = "If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?"
output = generate(model, tokenizer, prompt=prompt, max_tokens=500)
print(output)

Performance

Benchmarked on Mac Studio M4 with 36GB unified memory:

  • Generation speed: ~19 tokens/sec
  • Peak memory: ~18 GB
  • Prompt processing: ~5 tokens/sec

Conversion Details

Converted using mlx-lm:

mlx_lm.convert \
  --hf-path allenai/Olmo-3.1-32B-Think \
  --mlx-path ./olmo-3.1-32b-think-4bit \
  -q --q-bits 4 --q-group-size 64

License

Apache 2.0 (same as original model)

Credits

Downloads last month
43
Safetensors
Model size
32B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hunterbown/Olmo-3.1-32B-Think-MLX-4bit