Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.16084

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Paper • 2505.24760 • Published May 30 • 74

RL_Papers in general

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Paper • 2504.08672 • Published Apr 11 • 55
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

Paper • 2504.12322 • Published Apr 11 • 28
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published Apr 8 • 110
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Paper • 2504.05303 • Published Apr 7 • 5
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation

Paper • 2505.13215 • Published May 19 • 29

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

Research Papers/Reviews/Literature

Daily Research papers and review including older relevant content.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 62
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18 • 154
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published Mar 19 • 46
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 50

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 5.87k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 119 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 137
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 120
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 119
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 141

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Paper • 2502.04404 • Published Feb 6 • 25
Learning Adaptive Parallel Reasoning with Language Models

Paper • 2504.15466 • Published Apr 21 • 43
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Paper • 2504.13367 • Published Apr 17 • 26

Test-Time Compute/Optimal Scaling

Scaling LLM Inference with Optimized Sample Compute Allocation

Paper • 2410.22480 • Published Oct 29, 2024
Test-time Computing: from System-1 Thinking to System-2 Thinking

Paper • 2501.02497 • Published Jan 5 • 46
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Paper • 2412.14135 • Published Dec 18, 2024
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Paper • 2505.24760 • Published May 30 • 74

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 5.87k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 119 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

RL_Papers in general

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Paper • 2504.08672 • Published Apr 11 • 55
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

Paper • 2504.12322 • Published Apr 11 • 28
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 137
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 120
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published Apr 8 • 110
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Paper • 2504.05303 • Published Apr 7 • 5
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation

Paper • 2505.13215 • Published May 19 • 29

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 119
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 141

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Paper • 2502.04404 • Published Feb 6 • 25
Learning Adaptive Parallel Reasoning with Language Models

Paper • 2504.15466 • Published Apr 21 • 43
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Paper • 2504.13367 • Published Apr 17 • 26

Research Papers/Reviews/Literature

Daily Research papers and review including older relevant content.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 62
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18 • 154
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published Mar 19 • 46
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 50

Test-Time Compute/Optimal Scaling

Scaling LLM Inference with Optimized Sample Compute Allocation

Paper • 2410.22480 • Published Oct 29, 2024
Test-time Computing: from System-1 Thinking to System-2 Thinking

Paper • 2501.02497 • Published Jan 5 • 46
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Paper • 2412.14135 • Published Dec 18, 2024
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs