Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published Sep 1 • 24
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published Apr 29, 2024 • 30
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models Paper • 2403.00818 • Published Feb 26, 2024 • 19
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets Paper • 2010.14819 • Published Oct 28, 2020
GhostNetV2: Enhance Cheap Operation with Long-Range Attention Paper • 2211.12905 • Published Nov 23, 2022
Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation Paper • 2303.11579 • Published Mar 21, 2023
GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks? Paper • 2306.00693 • Published Jun 1, 2023
Masked Image Modeling with Local Multi-Scale Reconstruction Paper • 2303.05251 • Published Mar 9, 2023
Boosting Semantic Segmentation from the Perspective of Explicit Class Embeddings Paper • 2308.12894 • Published Aug 24, 2023