Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3, 2024 • 32
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing Paper • 2508.14033 • Published Aug 19 • 1
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 258
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 185