Lost in Embeddings: Information Loss in Vision-Language Models Paper • 2509.11986 • Published Sep 15 • 27
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 188
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges Paper • 2508.18076 • Published Aug 25 • 6
AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions Paper • 2508.16402 • Published Aug 22 • 14
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning Paper • 2508.03501 • Published Aug 5 • 57
Tool-integrated Reinforcement Learning for Repo Deep Search Paper • 2508.03012 • Published Aug 5 • 20
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities Paper • 2507.19766 • Published Jul 26 • 14
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published May 5 • 34
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 89
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 123
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published May 24, 2024 • 17
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14, 2024 • 34
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14, 2024 • 129
Common 7B Language Models Already Possess Strong Math Capabilities Paper • 2403.04706 • Published Mar 7, 2024 • 20