Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models Paper • 2511.01618 • Published 12 days ago • 9
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models Paper • 2511.01618 • Published 12 days ago • 9
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models Paper • 2511.01618 • Published 12 days ago • 9 • 1
CompBench: Benchmarking Complex Instruction-guided Image Editing Paper • 2505.12200 • Published May 18
Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback Paper • 2507.20766 • Published Jul 28 • 1
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video? Paper • 2509.24709 • Published Sep 29 • 5
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models Paper • 2510.01304 • Published Oct 1 • 10
Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning? Paper • 2510.06036 • Published Oct 7 • 6
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models Paper • 2510.01304 • Published Oct 1 • 10