On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 178
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers Paper • 2507.04404 • Published Jul 6 • 21
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10 • 88
meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 257k • • 1.54k