ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models Paper β’ 2509.21991 β’ Published Sep 26 β’ 5
ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models Paper β’ 2509.21991 β’ Published Sep 26 β’ 5
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper β’ 2506.08279 β’ Published Jun 9 β’ 27
meta-llama/Llama-4-Maverick-17B-128E-Instruct Image-Text-to-Text β’ 402B β’ Updated May 22 β’ 21.3k β’ β’ 425
meta-llama/Llama-4-Scout-17B-16E-Instruct Image-Text-to-Text β’ 109B β’ Updated May 22 β’ 204k β’ β’ 1.14k
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper β’ 2504.00557 β’ Published Apr 1 β’ 15
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper β’ 2504.00557 β’ Published Apr 1 β’ 15