On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width Paper • 2312.12226 • Published Dec 19, 2023 • 1
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks Paper • 2508.18672 • Published Aug 26 • 10