-
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 30 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 31 -
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Paper • 2507.05255 • Published • 74
Collections
Discover the best community collections!
Collections including paper arxiv:2504.11468
-
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Paper • 2504.07951 • Published • 29 -
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Paper • 2504.08003 • Published • 49 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 30 -
Towards Learning to Complete Anything in Lidar
Paper • 2504.12264 • Published • 9
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
UCSC-VLAA/VLAA-Thinker-Qwen2.5VL-3B
Image-Text-to-Text • 4B • Updated • 447 • 5 -
UCSC-VLAA/VLAA-Thinker-Qwen2.5VL-7B
Image-Text-to-Text • 8B • Updated • 8.64k • 2 -
UCSC-VLAA/VLAA-Thinker-Qwen2VL-2B
Image-Text-to-Text • 2B • Updated • 158 • 1 -
UCSC-VLAA/VLAA-Thinker-Qwen2VL-7B
Image-Text-to-Text • 8B • Updated • 95
-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 15 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 23 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 26 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 30
-
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 30 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 31 -
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Paper • 2507.05255 • Published • 74
-
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Paper • 2504.07951 • Published • 29 -
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Paper • 2504.08003 • Published • 49 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 30 -
Towards Learning to Complete Anything in Lidar
Paper • 2504.12264 • Published • 9
-
UCSC-VLAA/VLAA-Thinker-Qwen2.5VL-3B
Image-Text-to-Text • 4B • Updated • 447 • 5 -
UCSC-VLAA/VLAA-Thinker-Qwen2.5VL-7B
Image-Text-to-Text • 8B • Updated • 8.64k • 2 -
UCSC-VLAA/VLAA-Thinker-Qwen2VL-2B
Image-Text-to-Text • 2B • Updated • 158 • 1 -
UCSC-VLAA/VLAA-Thinker-Qwen2VL-7B
Image-Text-to-Text • 8B • Updated • 95
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 15 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 23 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 26 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 30