Lipschitz Bounded Neural Networks: Figure Data and Model Checkpoints

This repository contains the figure data and model checkpoints supporting our paper on training Lipschitz bounded neural networks. The repository provides reproducible data for key figures and pre-trained models that demonstrate the effectiveness of various constraint methods.

Repository Structure

โ”œโ”€โ”€ figures/              # Figure data and reproduction scripts
โ”‚   โ”œโ”€โ”€ figure_2/        # Data for Figure 2
โ”‚   โ”œโ”€โ”€ figure_3/        # Data for Figure 3
โ”‚   โ”œโ”€โ”€ figure_4/        # Data for Figure 4 
โ”‚   โ”œโ”€โ”€ reproduce_figures.py   # Script to reproduce figures from CSV data
โ”‚   โ”œโ”€โ”€ requirements.txt       # Python dependencies
โ”‚   โ””โ”€โ”€ README.md             # Detailed usage instructions
โ””โ”€โ”€ models/              # Pre-trained model checkpoints
    โ”œโ”€โ”€ MLPs/            # MLP models trained on CIFAR-10
    โ””โ”€โ”€ transformers/    # Transformer models trained on Shakespeare

Figure Data

The figures/ directory contains CSV files with the processed data used to create each figure in our paper, along with scripts to reproduce the exact plots. This enables full reproducibility of our experimental results.

Model Checkpoints

MLP Models (models/MLPs/)

Two MLP models trained on CIFAR-10 that were used to generate Figure 3 in our paper.

Training Details:

  • Dataset: CIFAR-10
  • Selection: These are the models we used to generate Figure 3
  • Architecture: Multi-layer perceptrons

Transformer Models (models/transformers/)

Best-performing transformer model trained on the Shakespeare word dataset, representing our optimal validation accuracy checkpoint.

Training Details:

  • Dataset: Shakespeare word-level dataset
  • Selection: Best validation accuracy checkpoint
  • Architecture: Transformer with Lipschitz constraints

Usage

Reproducing Figures

  1. Install dependencies:
pip install -r figures/requirements.txt
  1. Run the reproduction script:
python figures/reproduce_figures.py

Loading Model Checkpoints

The model checkpoints can be loaded using the main codebase. For detailed instructions on model loading and usage, please refer to the main project repository: https://github.com/Arongil/lipschitz-transformers

Paper Abstract

Neural networks are often highly sensitive to input and weight perturbations. This sensitivity has been linked to pathologies such as vulnerability to adversarial examples, divergent training, and overfitting. To combat these problems, past research has looked at building neural networks entirely from Lipschitz components. However, these techniques have not matured to the point where researchers have trained a modern architecture such as a transformer with a Lipschitz certificate enforced beyond initialization. To explore this gap, we begin by developing and benchmarking novel, computationally-efficient tools for maintaining norm-constrained weight matrices. Applying these tools, we are able to train transformer models with Lipschitz bounds enforced throughout training. We find that optimizer dynamics matter: switching from AdamW to Muon improves standard methodsโ€”weight decay and spectral normalizationโ€”allowing models to reach equal performance with a lower Lipschitz bound. Inspired by Muonโ€™s update having a fixed spectral norm, we co-design a weight constraint method that improves the Lipschitz vs. performance tradeoff on MLPs and 2M parameter transformers. Our 2-Lipschitz transformer on Shakespeare text reaches validation accuracy 60%. Scaling to 145M parameters, our 10-Lipschitz transformer reaches 21% accuracy on internet text. However, to match the NanoGPT baseline validation accuracy of 39.4%, our Lipschitz upper bound increases to 10^264. Nonetheless, our Lipschitz transformers train without stability measures such as layer norm, QK norm, and logit tanh softcapping.

Related Resources

Citation

If you use this data or these models in your research, please cite our paper:

@article{newhouse2025lipschitztransformers,
  title={Training Transformers with Enforced Lipschitz Constants},
  author={Laker Newhouse, R. Preston Hess, Franz Cesista, Andrii Zahorodnii, Jeremy Bernstein, Phillip Isola},
  journal={arXiv}
  year={2025}
}

License

This repository is released under the MIT License. The data and models are provided for research purposes.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support