Collections
Discover the best community collections!
Collections including paper arxiv:2505.23621
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 26 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 31 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 19
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
Paper • 2505.18125 • Published • 112 -
On-Policy RL with Optimal Reward Baseline
Paper • 2505.23585 • Published • 14 -
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Paper • 2505.23604 • Published • 23 -
Are Reasoning Models More Prone to Hallucination?
Paper • 2505.23646 • Published • 24
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 62 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
Paper • 2505.18125 • Published • 112 -
On-Policy RL with Optimal Reward Baseline
Paper • 2505.23585 • Published • 14 -
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Paper • 2505.23604 • Published • 23 -
Are Reasoning Models More Prone to Hallucination?
Paper • 2505.23646 • Published • 24
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 26 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 31 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 19
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 62 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69