Benchmarking Optimizers for Large Language Model Pretraining Paper โข 2509.01440 โข Published Sep 1 โข 24
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper โข 2404.18911 โข Published Apr 29, 2024 โข 30
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models Paper โข 2403.00818 โข Published Feb 26, 2024 โข 19