Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy Paper • 2511.21579 • Published 8 days ago • 21
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published Jul 11 • 61