LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper โข 2501.03895 โข Published Jan 7 โข 52 โข 4