Lipschitz Bounded Neural Networks: Figure Data and Model Checkpoints
This repository contains the figure data and model checkpoints supporting our paper on training Lipschitz bounded neural networks. The repository provides reproducible data for key figures and pre-trained models that demonstrate the effectiveness of various constraint methods.
Repository Structure
โโโ figures/ # Figure data and reproduction scripts
โ โโโ figure_2/ # Data for Figure 2
โ โโโ figure_3/ # Data for Figure 3
โ โโโ figure_4/ # Data for Figure 4
โ โโโ reproduce_figures.py # Script to reproduce figures from CSV data
โ โโโ requirements.txt # Python dependencies
โ โโโ README.md # Detailed usage instructions
โโโ models/ # Pre-trained model checkpoints
โโโ MLPs/ # MLP models trained on CIFAR-10
โโโ transformers/ # Transformer models trained on Shakespeare
Figure Data
The figures/ directory contains CSV files with the processed data used to create each figure in our paper, along with scripts to reproduce the exact plots. This enables full reproducibility of our experimental results.
Model Checkpoints
MLP Models (models/MLPs/)
Two MLP models trained on CIFAR-10 that were used to generate Figure 3 in our paper.
Training Details:
- Dataset: CIFAR-10
- Selection: These are the models we used to generate Figure 3
- Architecture: Multi-layer perceptrons
Transformer Models (models/transformers/)
Best-performing transformer model trained on the Shakespeare word dataset, representing our optimal validation accuracy checkpoint.
Training Details:
- Dataset: Shakespeare word-level dataset
- Selection: Best validation accuracy checkpoint
- Architecture: Transformer with Lipschitz constraints
Usage
Reproducing Figures
- Install dependencies:
pip install -r figures/requirements.txt
- Run the reproduction script:
python figures/reproduce_figures.py
Loading Model Checkpoints
The model checkpoints can be loaded using the main codebase. For detailed instructions on model loading and usage, please refer to the main project repository: https://github.com/Arongil/lipschitz-transformers
Paper Abstract
Neural networks are often highly sensitive to input and weight perturbations. This sensitivity has been linked to pathologies such as vulnerability to adversarial examples, divergent training, and overfitting. To combat these problems, past research has looked at building neural networks entirely from Lipschitz components. However, these techniques have not matured to the point where researchers have trained a modern architecture such as a transformer with a Lipschitz certificate enforced beyond initialization. To explore this gap, we begin by developing and benchmarking novel, computationally-efficient tools for maintaining norm-constrained weight matrices. Applying these tools, we are able to train transformer models with Lipschitz bounds enforced throughout training. We find that optimizer dynamics matter: switching from AdamW to Muon improves standard methodsโweight decay and spectral normalizationโallowing models to reach equal performance with a lower Lipschitz bound. Inspired by Muonโs update having a fixed spectral norm, we co-design a weight constraint method that improves the Lipschitz vs. performance tradeoff on MLPs and 2M parameter transformers. Our 2-Lipschitz transformer on Shakespeare text reaches validation accuracy 60%. Scaling to 145M parameters, our 10-Lipschitz transformer reaches 21% accuracy on internet text. However, to match the NanoGPT baseline validation accuracy of 39.4%, our Lipschitz upper bound increases to 10^264. Nonetheless, our Lipschitz transformers train without stability measures such as layer norm, QK norm, and logit tanh softcapping.
Related Resources
- Main Codebase: https://github.com/Arongil/lipschitz-transformers - Full implementation and training scripts
- Paper: [TO BE ADDED] - Detailed methodology and theoretical analysis
Citation
If you use this data or these models in your research, please cite our paper:
@article{newhouse2025lipschitztransformers,
title={Training Transformers with Enforced Lipschitz Constants},
author={Laker Newhouse, R. Preston Hess, Franz Cesista, Andrii Zahorodnii, Jeremy Bernstein, Phillip Isola},
journal={arXiv}
year={2025}
}
License
This repository is released under the MIT License. The data and models are provided for research purposes.