1 7 1

wang

stormthunder

AI & ML interests

None yet

Recent Activity

upvoted a paper 26 days ago

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

upvoted a paper 28 days ago

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

authored a paper 2 months ago

SAIL-VL2 Technical Report

View all activity

Organizations

upvoted a paper 26 days ago

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published 26 days ago • 54

upvoted a paper 28 days ago

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published 28 days ago • 35

authored a paper 2 months ago

SAIL-VL2 Technical Report

Paper • 2509.14033 • Published Sep 17 • 44

upvoted a paper 2 months ago

SAIL-VL2 Technical Report

Paper • 2509.14033 • Published Sep 17 • 44

authored 2 papers 4 months ago

SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement

Paper • 2507.01643 • Published Jul 2 • 1

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Paper • 2507.07999 • Published Jul 10 • 49

upvoted a paper 4 months ago

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Paper • 2507.07999 • Published Jul 10 • 49

upvoted a paper 5 months ago

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published Apr 14 • 15

liked a dataset 5 months ago

BytedanceDouyinContent/VGR

Viewer • Updated Jun 18 • 90k • 284 • 6

upvoted a paper 5 months ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 271

authored 2 papers 5 months ago

Benchmarking and Improving Detail Image Caption

Paper • 2405.19092 • Published May 29, 2024

Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

Paper • 2405.17871 • Published May 28, 2024 • 1

commented a paper 5 months ago

VGR: Visual Grounded Reasoning

Paper • 2506.11991 • Published Jun 13 • 19 •

authored 5 papers 5 months ago

World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering

Paper • 2409.20424 • Published Sep 30, 2024

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

Paper • 2503.09949 • Published Mar 13 • 5

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published Apr 14 • 15

Unveiling the Tapestry of Consistency in Large Vision-Language Models

Paper • 2405.14156 • Published May 23, 2024

VGR: Visual Grounded Reasoning

Paper • 2506.11991 • Published Jun 13 • 19

upvoted a paper 5 months ago

VGR: Visual Grounded Reasoning

Paper • 2506.11991 • Published Jun 13 • 19

wang

AI & ML interests

Recent Activity

Organizations

stormthunder's activity