Zenith V1
Collection
All V1 models of Zenith series • 4 items • Updated
• 1
Flagship 70B parameter model optimized for Tenstorrent p300a hardware, based on DeepSeek-R1-Distill-Llama-70B.
cd Zenith/V1-Tenstorrent-Blackhole-p300/70B
pip install -r requirements.txt
IMPORTANT: 70B is extremely large. Use LoRA or QLoRA exclusively.
# QLoRA (4-bit) - Recommended
python train.py \
--base_model deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
--train_data ./data/train.json \
--use_qlora \
--use_lora \
--lora_r 8 \
--lora_alpha 16 \
--epochs 1 \
--batch_size 1 \
--gradient_accumulation_steps 32 \
--learning_rate 5e-6 \
--use_ring_attention \
--max_seq_length 32768 \
--tensor_parallel_size 8 \
--pipeline_parallel_size 4 \
--use_noc_optimization \
--mixed_precision bf16 \
--use_quality_filter \
--use_curriculum
# LoRA (8-bit) - Alternative
python train.py \
--base_model deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
--train_data ./data/train.json \
--use_lora \
--lora_r 8 \
--lora_alpha 16 \
--epochs 1 \
--batch_size 1 \
--gradient_accumulation_steps 32 \
--learning_rate 5e-6 \
...
Do NOT attempt full fine-tuning unless you have specialized hardware beyond p300.
# Interactive mode
python inference.py --checkpoint ./outputs/checkpoint-final
# Single prompt (long context)
python inference.py \
--checkpoint ./outputs/checkpoint-final \
--prompt "Analyze this 30K document and extract key insights..." \
--max_new_tokens 2048 \
--temperature 0.55
# Build model (requires ~140GB disk space for 70B 4-bit quantized)
ollama create zenith-70b-p300 -f Modelfile
# Run
ollama run zenith-70b-p300 "Explain the implications of Gödel's incompleteness theorems"
# Long context
ollama run zenith-70b-p300 "Read this 32K document and provide a comprehensive summary: [paste text]"
from configs.zenith_config import get_70b_config
config = get_70b_config()
print(config)
Key Parameters:
hidden_size: 8192num_layers: 64num_heads: 64num_experts: 12 (configurable)moe_top_k: 2max_seq_len: 32768use_ring_attention: Truering_attention_chunk_size: 8192ring_attention_overlap: 2048config.num_experts = 12
config.moe_top_k = 2
config.moe_load_balancing_weight = 0.01
config.moe_capacity_factor = 1.0
config.use_eq_adapter = True
config.eq_adapter_hidden_size = 64
config.eq_loss_weight = 0.03
from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig
ot_config = OpenThoughtsConfig(
dataset_name="open-thoughts/OpenThoughts3-1.2M",
streaming=True,
max_seq_length=32768,
quality_filtering=True,
curriculum_learning=True,
tokenizer=tokenizer
)
processor = OpenThoughtsProcessor(ot_config)
--use_ring_attention \
--ring_chunk_size 8192 \
--ring_overlap 2048
Enables 32K context on limited memory.
export MASTER_ADDR=localhost
export MASTER_PORT=29500
export WORLD_SIZE=2
torchrun --nproc_per_node=2 --nnodes=1 train.py ...
--mixed_precision bf16
--gradient_checkpointing
Reduces memory by ~60%.
python test_model.py
Tests:
python -m evaluation.benchmark \
--model_path ./outputs/checkpoint-final \
--benchmarks humaneval mbpp gsm8k math truthfulqa
ollama create zenith-70b-p300 -f Modelfile
ollama run zenith-70b-p300 "Your prompt here"
python -m vllm.entrypoints.openai.api_server \
--model ./outputs/checkpoint-final \
--tensor-parallel-size 2 \
--max-model-len 32768 \
--port 8000
docker run --gpus all -p 8080:80 \
-v ./outputs/checkpoint-final:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id /data \
--max-input-length 32768 \
--max-total-tokens 36864
| Configuration | Memory | Speed (tokens/s) | Quality |
|---|---|---|---|
| QLoRA r=8, 2K | ~10GB | 80-120 | 95% |
| LoRA r=8, 2K | ~16GB | 60-90 | 98% |
| Ring 32K | +20% | 25-45 | Enables long context |
Note: Full 70B requires ~140GB VRAM, not feasible on p300 without quantization.
@misc{zenith-70b-p300-2025,
title={Zenith-70B-p300: A Tenstorrent-Optimized 70B Model with Ring Attention and MoE},
author={Zenith Project},
year={2025}
}
[Specify]
README.md for quick referenceFINETUNE_GUIDE.md for detailed instructionsconfigs/zenith_config.py for configurationBase model
deepseek-ai/DeepSeek-R1-Distill-Llama-70B