-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 216 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
Collections
Discover the best community collections!
Collections including paper arXiv:2507.19849
-
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14 -
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper • 2507.21802 • Published • 15 -
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
Paper • 2507.21848 • Published • 8 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 65 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 109
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 28 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 307 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 96
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 216 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14 -
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper • 2507.21802 • Published • 15 -
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
Paper • 2507.21848 • Published • 8 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 28 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 307 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 96
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 65 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 109