Contamination Detection for VLMs using Multi-Modal Semantic Perturbation Paper • 2511.03774 • Published 4 days ago • 12
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published 3 days ago • 159
TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models Paper • 2511.02802 • Published 5 days ago • 12
Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning Paper • 2511.02818 • Published 5 days ago • 13
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published 4 days ago • 47
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models Paper • 2511.02650 • Published 5 days ago • 8
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer Paper • 2510.25976 • Published 11 days ago • 9
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs Paper • 2511.02243 • Published 5 days ago • 23
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published 5 days ago • 95
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization Paper • 2510.25616 • Published 11 days ago • 88
The Denario project: Deep knowledge AI agents for scientific discovery Paper • 2510.26887 • Published 10 days ago • 6
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model Paper • 2510.27607 • Published 9 days ago • 8
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens Paper • 2510.24940 • Published 12 days ago • 15