-
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 63 -
Reinforcing General Reasoning without Verifiers
Paper • 2505.21493 • Published • 26 -
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Paper • 2505.24760 • Published • 74
Collections
Discover the best community collections!
Collections including paper arxiv:2504.16084
-
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Paper • 2504.08672 • Published • 55 -
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
Paper • 2504.12322 • Published • 28 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120
-
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Paper • 2504.06261 • Published • 110 -
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper • 2504.05303 • Published • 5 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation
Paper • 2505.13215 • Published • 29
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 62 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 154 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.87k • 1.22k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 119 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Paper • 2504.08791 • Published • 137 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141
-
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models
Paper • 2502.04404 • Published • 25 -
Learning Adaptive Parallel Reasoning with Language Models
Paper • 2504.15466 • Published • 43 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Paper • 2504.13367 • Published • 26
-
Scaling LLM Inference with Optimized Sample Compute Allocation
Paper • 2410.22480 • Published -
Test-time Computing: from System-1 Thinking to System-2 Thinking
Paper • 2501.02497 • Published • 46 -
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Paper • 2412.14135 • Published -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99
-
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 63 -
Reinforcing General Reasoning without Verifiers
Paper • 2505.21493 • Published • 26 -
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Paper • 2505.24760 • Published • 74
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.87k • 1.22k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 119 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Paper • 2504.08672 • Published • 55 -
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
Paper • 2504.12322 • Published • 28 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120
-
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Paper • 2504.08791 • Published • 137 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126
-
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Paper • 2504.06261 • Published • 110 -
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper • 2504.05303 • Published • 5 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation
Paper • 2505.13215 • Published • 29
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models
Paper • 2502.04404 • Published • 25 -
Learning Adaptive Parallel Reasoning with Language Models
Paper • 2504.15466 • Published • 43 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Paper • 2504.13367 • Published • 26
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 62 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 154 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
Scaling LLM Inference with Optimized Sample Compute Allocation
Paper • 2410.22480 • Published -
Test-time Computing: from System-1 Thinking to System-2 Thinking
Paper • 2501.02497 • Published • 46 -
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Paper • 2412.14135 • Published -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99