Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published 8 days ago • 48
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29 • 140
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 58
LiveTradeBench: Seeking Real-World Alpha with Large Language Models Paper • 2511.03628 • Published Nov 5 • 11
PromptBridge: Cross-Model Prompt Transfer for Large Language Models Paper • 2512.01420 • Published 9 days ago • 8
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents Paper • 2510.09577 • Published Oct 10 • 7
Diversity Has Always Been There in Your Visual Autoregressive Models Paper • 2511.17074 • Published 19 days ago • 7
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance Paper • 2511.13254 • Published 23 days ago • 134
Search Self-play: Pushing the Frontier of Agent Capability without Supervision Paper • 2510.18821 • Published Oct 21 • 17
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published Sep 26 • 57
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published Oct 22 • 114
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning Paper • 2511.16043 • Published 20 days ago • 105
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models Paper • 2510.03561 • Published Oct 3 • 24
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper • 2509.08755 • Published Sep 10 • 56
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique Paper • 2511.09067 • Published 28 days ago • 2
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning Paper • 2510.23038 • Published Oct 27
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning Paper • 2511.06805 • Published about 1 month ago • 12
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation Paper • 2511.15958 • Published 21 days ago • 1
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering Paper • 2511.19899 • Published 15 days ago
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows Paper • 2512.05150 • Published 7 days ago • 59