LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 17 days ago • 133
baidu/ERNIE-4.5-VL-28B-A3B-Thinking Image-Text-to-Text • 30B • Updated about 1 month ago • 1.12k • 516
Running Featured 58 ERNIE-4.5-VL-28B-A3B-Thinking Demo 👐 58 Compact model, powerful multimodal reasoning.
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3, 2024 • 32