TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar Paper • 2510.14972 • Published 24 days ago • 30
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning Paper • 2509.22824 • Published Sep 26 • 20
VideoScore2: Think before You Score in Generative Video Evaluation Paper • 2509.22799 • Published Sep 26 • 24
Interactive Training: Feedback-Driven Neural Network Optimization Paper • 2510.02297 • Published Oct 2 • 42
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls Paper • 2510.00184 • Published Sep 30 • 16
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 99
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 73
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27 • 84
IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance Paper • 2502.08395 • Published Feb 12
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models Paper • 2507.08800 • Published Jul 11 • 79
POSS: Position Specialist Generates Better Draft for Speculative Decoding Paper • 2506.03566 • Published Jun 4 • 6
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs Paper • 2505.20139 • Published May 26 • 19
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Paper • 2505.16175 • Published May 22 • 41
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Paper • 2505.15612 • Published May 21 • 34
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20 • 24
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 71