Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation Paper • 2509.26555 • Published Sep 30
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1, 2024 • 31
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models Paper • 2404.03118 • Published Apr 3, 2024 • 26
FastRM: An efficient and automatic explainability framework for multimodal generative models Paper • 2412.01487 • Published Dec 2, 2024 • 1
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models Paper • 2408.02231 • Published Aug 5, 2024 • 2
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation Paper • 2404.08540 • Published Apr 12, 2024 • 12
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1, 2024 • 31
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1, 2024 • 31
DataComp: In search of the next generation of multimodal datasets Paper • 2304.14108 • Published Apr 27, 2023 • 2
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1, 2024 • 31