Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling Paper • 2508.03404 • Published Aug 5 • 4
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning Paper • 2508.06259 • Published Aug 8 • 2
DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing Paper • 2510.02253 • Published Oct 2 • 13
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow Paper • 2509.21789 • Published Sep 26 • 9
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published 29 days ago • 21
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published 29 days ago • 21
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow Paper • 2509.21789 • Published Sep 26 • 9
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning Paper • 2509.23768 • Published Sep 28 • 48
DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing Paper • 2510.02253 • Published Oct 2 • 13
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling Paper • 2508.03404 • Published Aug 5 • 4 • 2
CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation Paper • 2506.23121 • Published Jun 29 • 2 • 1
ICHPro: Intracerebral Hemorrhage Prognosis Classification Via Joint-attention Fusion-based 3d Cross-modal Network Paper • 2402.11307 • Published Feb 17, 2024
CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation Paper • 2506.23121 • Published Jun 29 • 2
CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation Paper • 2506.23121 • Published Jun 29 • 2