zxw's picture

13

zxw

hitsz-zxw

·

AI & ML interests

None yet

Organizations

None yet

upvoted 13 papers 6 months ago

MSDF: A General Open-Domain Multi-Skill Dialog Framework

Paper • 2206.08626 • Published Jun 17, 2022 • 2

A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text

Paper • 2305.02265 • Published May 3, 2023 • 2

LMEye: An Interactive Perception Network for Large Language Models

Paper • 2305.03701 • Published May 5, 2023 • 2

A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering

Paper • 2311.07536 • Published Nov 13, 2023 • 3

LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs

Paper • 2402.13546 • Published Feb 21, 2024 • 3

Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

Paper • 2402.13561 • Published Feb 21, 2024 • 1

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

Paper • 2402.13587 • Published Feb 21, 2024 • 2

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Paper • 2405.11273 • Published May 18, 2024 • 19

VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

Paper • 2406.11303 • Published Jun 17, 2024 • 3

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Paper • 2408.09787 • Published Aug 19, 2024 • 10

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper • 2501.12326 • Published Jan 21 • 65

VideoVista-CulturalLingo: 360^circ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension

Paper • 2504.17821 • Published Apr 23 • 24

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 185