vit-beans-v3

Geometric Deep Learning with Cantor Multihead Fusion + AdamW Warm Restarts

This repository contains multiple training runs using Cantor fusion architecture with pentachoron structures, geometric routing, and CosineAnnealingWarmRestarts for automatic exploration cycles.

Training Strategy: AdamW + Warm Restarts

This model uses AdamW with Cosine Annealing Warm Restarts (SGDR):

Drop phase: LR decays from 0.0003 → 1e-07 over 40 epochs
Restart phase: LR jumps back to 0.0003 to explore new regions
Cycle multiplier: Each cycle is 1.5x longer than previous
Benefits: Automatic exploration + exploitation, finds better minima, robust training

🚀 LR Boost at Restarts (NEW!)

This run uses restart_lr_mult = 1.25x:

Normal restart: 3e-4 → 1e-7 → restart at 3e-4
Boosted restart: 3e-4 → 1e-7 → restart at 3.75e-04 (1.25x!)
Creates wider exploration curves to escape solidified local minima
Each restart provides progressively stronger exploration boost

Restart Schedule

Epochs 0-40:   LR: 0.0003 → 1e-07 (first cycle)
Epoch 40:      LR: RESTART to 0.00037499999999999995 🔄
Epochs 40-100.0: LR: 0.00037499999999999995 → 1e-07 (longer cycle)
...

Current Run

Latest: cifar100_weighted_ADAMW_WarmRestart_boost1.25x_20251121_203807

Dataset: CIFAR100
Fusion Mode: weighted
Optimizer: AdamW (adaptive moments)
Scheduler: CosineAnnealingWarmRestarts
Restart LR Mult: 1.25x
Architecture: 16 blocks, 8 heads
Simplex: 4-simplex (5 vertices)

Architecture

The Cantor Fusion architecture uses:

Geometric Routing: Pentachoron (5-simplex) structures for token routing
Cantor Multihead Fusion: Multiple fusion heads with geometric attention
Beatrix Consciousness Routing: Optional consciousness-aware token fusion
SafeTensors Format: All model weights use SafeTensors (not pickle)

Usage

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

model_path = hf_hub_download(
    repo_id="AbstractPhil/vit-beans-v3",
    filename="runs/YOUR_RUN_NAME/checkpoints/best_model.safetensors"
)

state_dict = load_file(model_path)
model.load_state_dict(state_dict)

Citation

@misc{vit_beans_v3,
  author = {AbstractPhil},
  title = {vit-beans-v3: Geometric Deep Learning with Warm Restarts},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/AbstractPhil/vit-beans-v3}
}

Repository maintained by: @AbstractPhil

Latest update: 2025-11-21 20:38:10

Downloads last month: -; Downloads are not tracked for this model. How to track

AbstractPhil
/

vit-beans-v3