Anupam G
mapuna
AI & ML interests
None yet
Recent Activity
updated
a collection
about 2 months ago
Multimodal
upvoted
a
paper
about 2 months ago
Qwen3-Omni Technical Report
upvoted
a
paper
about 2 months ago
A Vision-Language-Action-Critic Model for Robotic Real-World
Reinforcement Learning
Organizations
Agents
Robotics
Search
Thinking
-
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
Paper • 2503.19855 • Published • 29 -
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Paper • 2503.22675 • Published • 36 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Paper • 2505.08617 • Published • 41
Vision
-
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Paper • 2503.18931 • Published • 30 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 59 -
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
Paper • 2507.15852 • Published • 38
Neurosymbolic
Reasoning
-
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models
Paper • 2503.21380 • Published • 38 -
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper • 2503.21776 • Published • 79 -
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Paper • 2503.21696 • Published • 23 -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15
Basics
Multimodal
-
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Paper • 2503.20756 • Published • 7 -
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 98 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 204 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 135
DeepFake
GenAI
RL
Neurosymbolic
Agents
Reasoning
-
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models
Paper • 2503.21380 • Published • 38 -
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper • 2503.21776 • Published • 79 -
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Paper • 2503.21696 • Published • 23 -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15
Robotics
Basics
Search
Multimodal
-
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Paper • 2503.20756 • Published • 7 -
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 98 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 204 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 135
Thinking
-
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
Paper • 2503.19855 • Published • 29 -
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Paper • 2503.22675 • Published • 36 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Paper • 2505.08617 • Published • 41
DeepFake
Vision
-
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Paper • 2503.18931 • Published • 30 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 59 -
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
Paper • 2507.15852 • Published • 38
GenAI