SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment Paper • 2512.02807 • Published 14 days ago • 7