Zenith V1
Collection
All V1 models of Zenith series • 4 items • Updated
• 1
Tenstorrent p300a-optimized 32B parameter model based on DeepSeek-R1-Distill-Qwen-32B.
cd Zenith/V1-Tenstorrent-Blackhole-p300/32B
pip install -r requirements.txt
# LoRA fine-tuning (recommended)
python train.py \
--base_model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--train_data ./data/train.json \
--use_lora \
--lora_r 16 \
--lora_alpha 32 \
--epochs 3 \
--batch_size 4 \
--gradient_accumulation_steps 8 \
--learning_rate 1e-4 \
--use_ring_attention \
--max_seq_length 32768 \
--tensor_parallel_size 8 \
--pipeline_parallel_size 4 \
--use_noc_optimization \
--mixed_precision bf16
# Interactive mode
python inference.py --checkpoint ./outputs/checkpoint-final
# Single prompt
python inference.py \
--checkpoint ./outputs/checkpoint-final \
--prompt "Write a Python function to implement quicksort" \
--max_new_tokens 1024
ollama create zenith-32b-p300 -f Modelfile
ollama run zenith-32b-p300 "Explain the difference between supervised and unsupervised learning"
from configs.zenith_config import get_32b_config
config = get_32b_config()
Key Parameters:
hidden_size: 4096num_layers: 40num_heads: 32num_experts: 8 (configurable)moe_top_k: 2max_seq_len: 32768use_ring_attention: Truering_attention_chunk_size: 8192ring_attention_overlap: 2048from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig
ot_config = OpenThoughtsConfig(
dataset_name="open-thoughts/OpenThoughts3-1.2M",
streaming=True,
max_seq_length=32768,
quality_filtering=True,
curriculum_learning=True,
tokenizer=tokenizer
)
processor = OpenThoughtsProcessor(ot_config)
Multi-dimensional scoring:
--use_moe --num_experts 8 --moe_top_k 2
--use_eq_adapter --eq_loss_weight 0.05
--use_ring_attention --ring_chunk_size 8192 --ring_overlap 2048
python test_model.py
Tests cover:
python -m evaluation.benchmark \
--model_path ./outputs/checkpoint-final \
--benchmarks humaneval mbpp gsm8k math truthfulqa
ollama create zenith-32b-p300 -f Modelfile
ollama run zenith-32b-p300 "Your prompt here"
python -m vllm.entrypoints.openai.api_server \
--model ./outputs/checkpoint-final \
--tensor-parallel-size 2 \
--max-model-len 32768 \
--port 8000
| Configuration | Memory | Speed | Quality |
|---|---|---|---|
| Full FT, 2K | ~58GB | 50-80 | Baseline |
| LoRA r=16, 2K | ~18GB | 80-120 | 98% |
| QLoRA r=8, 2K | ~10GB | 100-150 | 95% |
| Ring 32K | +20% | 30-50 | Enables long context |
@misc{zenith-32b-p300-2025,
title={Zenith-32B-p300: A Tenstorrent-Optimized Reasoning Model},
author={Zenith Project},
year={2025}
}
[Specify]
README.mdFINETUNE_GUIDE.mdconfigs/zenith_config.pyBase model
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B