8 13 9

Hao Fei

scofield7419

http://haofei.vip/

AI & ML interests

Multimodal Learning, Large Language Model, Vision and Language, Natural Language Processing, Structural Modeling

Recent Activity

authored a paper about 4 hours ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

upvoted a paper about 8 hours ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

commented on a paper about 8 hours ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

View all activity

Organizations

authored a paper about 4 hours ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published 3 days ago • 22

authored 3 papers 1 day ago

Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding

Paper • 2509.11866 • Published Sep 15 • 1

MuSLR: Multimodal Symbolic Logical Reasoning

Paper • 2509.25851 • Published Sep 30 • 12

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models

Paper • 2508.12081 • Published Aug 16

authored a paper 6 months ago

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7 • 82

authored 2 papers 7 months ago

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Paper • 2504.13122 • Published Apr 17 • 20

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Paper • 2503.23377 • Published Mar 30 • 57

authored 2 papers 8 months ago

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Paper • 2503.24379 • Published Mar 31 • 76

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16 • 35

authored a paper 11 months ago

Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Paper • 2412.19806 • Published Oct 8, 2024 • 2

authored 10 papers 12 months ago

Transfer Visual Prompt Generator across LLMs

Paper • 2305.01278 • Published May 2, 2023

Reasoning Implicit Sentiment with Chain-of-Thought Prompting

Paper • 2305.11255 • Published May 18, 2023 • 2

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Paper • 2310.12798 • Published Oct 19, 2023 • 4

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

Paper • 2311.18651 • Published Nov 30, 2023

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Paper • 2408.09481 • Published Aug 18, 2024 • 1

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

Paper • 2402.11435 • Published Feb 18, 2024

Hao Fei

AI & ML interests

Recent Activity

Organizations

scofield7419's activity