Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published Oct 8, 2025 • 30
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Paper • 2505.06708 • Published May 10, 2025 • 9
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 180
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published Jul 1, 2025 • 80
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6, 2025 • 188