MoE-CNN

Model Description

  • Model type: Image Classification
  • License: MIT

How to Get Started with the Model

Use the code below to get started with the model.

model = MixtureOfExperts(num_experts=10)

checkpoint_path = "FP_ML_MOE_SIMPLE_99_75.pth"
checkpoint = torch.load(checkpoint_path)

model.load_state_dict(checkpoint['model_state_dict'])

print(f"Validation Accuracy: {checkpoint["val_accuracy"]:.2f}")

input_data = torch.randn(1, 1, 28, 28)
results = model.predict(input_data.to(device))
print("Results:", results)

Training Details

Training Data

https://huggingface.co/datasets/ylecun/mnist

Training Procedure

Data Augmentation

  • RandomRotation(10)
  • RandomAffine(0, shear=10)
  • RandomAffine(0, translate=(0.1, 0.1))
  • RandomResizedCrop(28, scale=(0.8, 1.0))
  • RandomPerspective(distortion_scale=0.2, p=0.5)
  • Resize((28, 28))

Training Hyperparameters

Adam with learning rate of 0.001 for fast initial convergence SGD with learning rate of 0.01 and learning rate decay to 0.001

Size

2,247,151 parameters with 674,145 effective parameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

https://huggingface.co/datasets/ylecun/mnist

Metrics

  • Accuracy: 99.75%
  • Error rate: 0.25%

Technical Specifications [optional]

Model Architecture

Mixture-of-Experts (MoE) architecture with a simple CNN as the experts.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Mikask/moe-cnn

Evaluation results