Chenrui Fan's picture

16

Chenrui Fan

Fcr09

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs

upvoted a paper about 2 months ago

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

authored a paper 5 months ago

Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

View all activity

Organizations

None yet

upvoted a paper 2 days ago

Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs

Paper • 2511.07419 • Published 3 days ago • 22

upvoted a paper about 2 months ago

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

Paper • 2509.14662 • Published Sep 18 • 13

upvoted 2 papers 5 months ago

Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Paper • 2506.21551 • Published Jun 26 • 28

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 185

upvoted 4 papers 6 months ago

DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors

Paper • 2505.23001 • Published May 29 • 8

Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28 • 46

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Paper • 2505.22334 • Published May 28 • 36

Gaming Tool Preferences in Agentic LLMs

Paper • 2505.18135 • Published May 23 • 8

upvoted 5 papers 7 months ago

NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes

Paper • 2504.11544 • Published Apr 15 • 43

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Paper • 2504.10514 • Published Apr 10 • 48

How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients

Paper • 2504.10766 • Published Apr 14 • 40

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Paper • 2504.07964 • Published Apr 10 • 61

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Paper • 2504.06514 • Published Apr 9 • 39

upvoted 3 papers about 1 year ago

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 63

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published Oct 14, 2024 • 51

WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents

Paper • 2410.07484 • Published Oct 9, 2024 • 51