vit-beans-v3
Geometric Deep Learning with Cantor Multihead Fusion + AdamW Warm Restarts
This repository contains multiple training runs using Cantor fusion architecture with pentachoron structures, geometric routing, and CosineAnnealingWarmRestarts for automatic exploration cycles.
Training Strategy: AdamW + Warm Restarts
This model uses AdamW with Cosine Annealing Warm Restarts (SGDR):
- Drop phase: LR decays from 0.0003 β 1e-07 over 40 epochs
- Restart phase: LR jumps back to 0.0003 to explore new regions
- Cycle multiplier: Each cycle is 1.5x longer than previous
- Benefits: Automatic exploration + exploitation, finds better minima, robust training
π LR Boost at Restarts (NEW!)
This run uses restart_lr_mult = 1.25x:
- Normal restart: 3e-4 β 1e-7 β restart at 3e-4
- Boosted restart: 3e-4 β 1e-7 β restart at 3.75e-04 (1.25x!)
- Creates wider exploration curves to escape solidified local minima
- Each restart provides progressively stronger exploration boost
Restart Schedule
Epochs 0-40: LR: 0.0003 β 1e-07 (first cycle)
Epoch 40: LR: RESTART to 0.00037499999999999995 π
Epochs 40-100.0: LR: 0.00037499999999999995 β 1e-07 (longer cycle)
...
Current Run
Latest: cifar100_weighted_ADAMW_WarmRestart_boost1.25x_20251121_203807
- Dataset: CIFAR100
- Fusion Mode: weighted
- Optimizer: AdamW (adaptive moments)
- Scheduler: CosineAnnealingWarmRestarts
- Restart LR Mult: 1.25x
- Architecture: 16 blocks, 8 heads
- Simplex: 4-simplex (5 vertices)
Architecture
The Cantor Fusion architecture uses:
- Geometric Routing: Pentachoron (5-simplex) structures for token routing
- Cantor Multihead Fusion: Multiple fusion heads with geometric attention
- Beatrix Consciousness Routing: Optional consciousness-aware token fusion
- SafeTensors Format: All model weights use SafeTensors (not pickle)
Usage
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
model_path = hf_hub_download(
repo_id="AbstractPhil/vit-beans-v3",
filename="runs/YOUR_RUN_NAME/checkpoints/best_model.safetensors"
)
state_dict = load_file(model_path)
model.load_state_dict(state_dict)
Citation
@misc{vit_beans_v3,
author = {AbstractPhil},
title = {vit-beans-v3: Geometric Deep Learning with Warm Restarts},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/AbstractPhil/vit-beans-v3}
}
Repository maintained by: @AbstractPhil
Latest update: 2025-11-21 20:38:10