GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23, 2024 • 23
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Paper • 2501.01005 • Published Jan 2 • 2