Xpos Vit - IMAGENET100
This model was trained using the vit-analysis framework.
Model Details
- Model Type: XPOS Vision Transformer
- Dataset: imagenet100
- Best Accuracy: 72.30%
- Image Size: 224
- Patch Size: 16
- Hidden Dim: 192
- Depth: 12
- Num Heads: 3
- MLP Dim: 768
- Num Classes: 100
Training Configuration
- Epochs: 120
- Batch Size: 512
- Learning Rate: 0.004
- Weight Decay: 0.05
- Label Smoothing: 0.1
Usage
import torch
from models import XPOSSimpleVisionTransformer
# Load checkpoint
checkpoint = torch.load('xpos_vit_imagenet100_best.pth')
model = ... # Initialize model with same config
model.load_state_dict(checkpoint['state_dict'])