Ruizhe Li's picture

1 4 1

Ruizhe Li

rzdiversity

·

https://www.ruizhe.space/

AI & ML interests

Mechanistic Interpretability, Multimodal LLMs

Recent Activity

authored a paper 5 days ago

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

upvoted a paper 5 days ago

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

submitted a paper 5 days ago

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

View all activity

Organizations

None yet

upvoted a paper 5 days ago

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

Paper • 2601.11061 • Published 9 days ago • 7

upvoted a paper 8 months ago

Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

Paper • 2505.16415 • Published May 22, 2025 • 1

upvoted a paper over 1 year ago

Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions

Paper • 2405.03205 • Published May 6, 2024 • 1

upvoted a collection over 1 year ago

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized • 135 items • Updated Dec 18, 2025 • 118