vit-beans-v3

Geometric Deep Learning with Cantor Multihead Fusion + AdamW Warm Restarts

This repository contains multiple training runs using Cantor fusion architecture with pentachoron structures, geometric routing, and CosineAnnealingWarmRestarts for automatic exploration cycles.

Training Strategy: AdamW + Warm Restarts

This model uses AdamW with Cosine Annealing Warm Restarts (SGDR):

  • Drop phase: LR decays from 0.0003 β†’ 1e-07 over 40 epochs
  • Restart phase: LR jumps back to 0.0003 to explore new regions
  • Cycle multiplier: Each cycle is 1.5x longer than previous
  • Benefits: Automatic exploration + exploitation, finds better minima, robust training

πŸš€ LR Boost at Restarts (NEW!)

This run uses restart_lr_mult = 1.25x:

  • Normal restart: 3e-4 β†’ 1e-7 β†’ restart at 3e-4
  • Boosted restart: 3e-4 β†’ 1e-7 β†’ restart at 3.75e-04 (1.25x!)
  • Creates wider exploration curves to escape solidified local minima
  • Each restart provides progressively stronger exploration boost

Restart Schedule

Epochs 0-40:   LR: 0.0003 β†’ 1e-07 (first cycle)
Epoch 40:      LR: RESTART to 0.00037499999999999995 πŸ”„
Epochs 40-100.0: LR: 0.00037499999999999995 β†’ 1e-07 (longer cycle)
...

Current Run

Latest: cifar100_weighted_ADAMW_WarmRestart_boost1.25x_20251121_203807

  • Dataset: CIFAR100
  • Fusion Mode: weighted
  • Optimizer: AdamW (adaptive moments)
  • Scheduler: CosineAnnealingWarmRestarts
  • Restart LR Mult: 1.25x
  • Architecture: 16 blocks, 8 heads
  • Simplex: 4-simplex (5 vertices)

Architecture

The Cantor Fusion architecture uses:

  • Geometric Routing: Pentachoron (5-simplex) structures for token routing
  • Cantor Multihead Fusion: Multiple fusion heads with geometric attention
  • Beatrix Consciousness Routing: Optional consciousness-aware token fusion
  • SafeTensors Format: All model weights use SafeTensors (not pickle)

Usage

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

model_path = hf_hub_download(
    repo_id="AbstractPhil/vit-beans-v3",
    filename="runs/YOUR_RUN_NAME/checkpoints/best_model.safetensors"
)

state_dict = load_file(model_path)
model.load_state_dict(state_dict)

Citation

@misc{vit_beans_v3,
  author = {AbstractPhil},
  title = {vit-beans-v3: Geometric Deep Learning with Warm Restarts},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/AbstractPhil/vit-beans-v3}
}

Repository maintained by: @AbstractPhil

Latest update: 2025-11-21 20:38:10

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train AbstractPhil/vit-beans-v3