A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity Paper • 2401.01967 • Published Jan 3, 2024
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls Paper • 2510.00184 • Published Sep 30 • 16