Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training Paper • 2509.25758 • Published Sep 30 • 22
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published Jun 24 • 44
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains Paper • 2410.09870 • Published Oct 13, 2024 • 8