ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference Paper • 2511.10645 • Published 10 days ago • 3
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference Paper • 2511.10645 • Published 10 days ago • 3
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published 27 days ago • 172
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 88
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published Jun 19 • 17
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published Jun 19 • 17
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer Paper • 2303.17605 • Published Mar 30, 2023
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing Paper • 2005.14187 • Published May 28, 2020 • 2
MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models Paper • 2308.12963 • Published Aug 24, 2023
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer Paper • 2301.08739 • Published Jan 20, 2023
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 52
AMC: AutoML for Model Compression and Acceleration on Mobile Devices Paper • 1802.03494 • Published Feb 10, 2018
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy Paper • 2006.08509 • Published Jun 15, 2020