Zenith-32B-p300 V1-Tenstorrent-Blackhole-p300

Tenstorrent p300a-optimized 32B parameter model based on DeepSeek-R1-Distill-Qwen-32B.

Features

  • 32B Parameters: Based on DeepSeek-R1-Distill-Qwen-32B
  • p300a Optimized: Specifically tuned for Tenstorrent p300a hardware
  • Ring Attention: 32K context window with efficient chunked attention
  • MoE Support: Mixture of Experts for sparse activation
  • EQ Adapter: Emotional intelligence capabilities
  • Reasoning & Code: Strong performance on reasoning and coding tasks
  • Tensor/Pipeline Parallelism: Optimized for distributed training
  • NoC Optimization: Efficient chip-to-chip communication
  • Ollama Compatible: Ready for deployment

Hardware Requirements

Training

  • Tenstorrent p300a: 2 chips (64 RISC-V cores)
  • Memory: 64GB GDDR6
  • Storage: 2TB+ NVMe SSD

Inference

  • p300a: Full 32K context supported
  • Standard GPU: 64GB+ VRAM (e.g., A100 80GB, H100 80GB)
  • Consumer GPUs: Use QLoRA or reduce context length

Quick Start

Installation

cd Zenith/V1-Tenstorrent-Blackhole-p300/32B
pip install -r requirements.txt

Training

# LoRA fine-tuning (recommended)
python train.py \
  --base_model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
  --train_data ./data/train.json \
  --use_lora \
  --lora_r 16 \
  --lora_alpha 32 \
  --epochs 3 \
  --batch_size 4 \
  --gradient_accumulation_steps 8 \
  --learning_rate 1e-4 \
  --use_ring_attention \
  --max_seq_length 32768 \
  --tensor_parallel_size 8 \
  --pipeline_parallel_size 4 \
  --use_noc_optimization \
  --mixed_precision bf16

Inference

# Interactive mode
python inference.py --checkpoint ./outputs/checkpoint-final

# Single prompt
python inference.py \
  --checkpoint ./outputs/checkpoint-final \
  --prompt "Write a Python function to implement quicksort" \
  --max_new_tokens 1024

Ollama

ollama create zenith-32b-p300 -f Modelfile
ollama run zenith-32b-p300 "Explain the difference between supervised and unsupervised learning"

Architecture

Model Configuration

from configs.zenith_config import get_32b_config

config = get_32b_config()

Key Parameters:

  • hidden_size: 4096
  • num_layers: 40
  • num_heads: 32
  • num_experts: 8 (configurable)
  • moe_top_k: 2
  • max_seq_len: 32768
  • use_ring_attention: True
  • ring_attention_chunk_size: 8192
  • ring_attention_overlap: 2048

p300 Optimizations

  • Tensor Parallelism (TP=8): Across 8 cores per chip
  • Pipeline Parallelism (PP=4): 4 stages per chip
  • NoC Optimization: Efficient inter-core communication
  • Ring Attention: 32K context without OOM
  • Mixed Precision: BF16 native support

Data Processing

OpenThoughts Integration

from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig

ot_config = OpenThoughtsConfig(
    dataset_name="open-thoughts/OpenThoughts3-1.2M",
    streaming=True,
    max_seq_length=32768,
    quality_filtering=True,
    curriculum_learning=True,
    tokenizer=tokenizer
)
processor = OpenThoughtsProcessor(ot_config)

Curriculum Stages

  1. Foundation: High-quality samples (score > 0.8)
  2. Reasoning: Chain-of-thought examples
  3. Code: Programming tasks
  4. Full: Complete dataset

Quality Filtering

Multi-dimensional scoring:

  • Length: 512-32000 tokens
  • Language: English
  • Repetition: < 15%
  • Coherence: > 0.7
  • Structure: Valid formatting
  • Thought quality: CoT depth > 3 steps

Advanced Features

MoE

--use_moe --num_experts 8 --moe_top_k 2
  • Top-2 routing
  • Load balancing loss
  • 60% middle layers use MoE

EQ Adapter

--use_eq_adapter --eq_loss_weight 0.05
  • Frustration detection
  • 8-emotion classification
  • Fused with attention

Ring Attention

--use_ring_attention --ring_chunk_size 8192 --ring_overlap 2048
  • Enables 32K context
  • Memory: O(seq_len × chunk_size)
  • Chunked processing

Testing

python test_model.py

Tests cover:

  • Model creation
  • Forward pass
  • p300 optimizations
  • MoE configuration
  • Ring attention
  • EQ adapter
  • Generation
  • Gradient flow

Evaluation

python -m evaluation.benchmark \
  --model_path ./outputs/checkpoint-final \
  --benchmarks humaneval mbpp gsm8k math truthfulqa

Deployment

Ollama

ollama create zenith-32b-p300 -f Modelfile
ollama run zenith-32b-p300 "Your prompt here"

vLLM

python -m vllm.entrypoints.openai.api_server \
  --model ./outputs/checkpoint-final \
  --tensor-parallel-size 2 \
  --max-model-len 32768 \
  --port 8000

Troubleshooting

Memory Issues

  • Reduce batch size
  • Use gradient accumulation
  • Enable LoRA/QLoRA
  • Reduce sequence length
  • Enable gradient checkpointing

Slow Training

  • Increase batch size
  • Reduce gradient accumulation
  • Use mixed precision
  • Optimize data loading
  • Enable NoC optimization

Poor Quality

  • Use curriculum learning
  • Apply quality filtering
  • Train more epochs
  • Adjust learning rate
  • Use more high-quality data

Performance

Configuration Memory Speed Quality
Full FT, 2K ~58GB 50-80 Baseline
LoRA r=16, 2K ~18GB 80-120 98%
QLoRA r=8, 2K ~10GB 100-150 95%
Ring 32K +20% 30-50 Enables long context

Citation

@misc{zenith-32b-p300-2025,
  title={Zenith-32B-p300: A Tenstorrent-Optimized Reasoning Model},
  author={Zenith Project},
  year={2025}
}

License

[Specify]

Support

  • Documentation: README.md
  • Fine-tuning: FINETUNE_GUIDE.md
  • Config: configs/zenith_config.py
  • Issues: Open with detailed logs
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Matrix-Corp/Zenith-32b-p300-V1

Finetuned
(88)
this model

Collection including Matrix-Corp/Zenith-32b-p300-V1