Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Paper • 2511.07419 • Published 3 days ago • 22
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory Paper • 2509.14662 • Published Sep 18 • 13
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26 • 28
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 185
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors Paper • 2505.23001 • Published May 29 • 8
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Paper • 2505.22453 • Published May 28 • 46
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Paper • 2505.22334 • Published May 28 • 36
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes Paper • 2504.11544 • Published Apr 15 • 43
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 48
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published Apr 14 • 40
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Paper • 2504.07964 • Published Apr 10 • 61
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published Apr 9 • 39
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Paper • 2410.23743 • Published Oct 31, 2024 • 63
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 51
WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents Paper • 2410.07484 • Published Oct 9, 2024 • 51