RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation Paper • 2601.08430 • Published 15 days ago • 57
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis Paper • 2601.05808 • Published 19 days ago • 36
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning Paper • 2512.23412 • Published about 1 month ago • 39
MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision Paper • 2508.08177 • Published Aug 11, 2025
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 23 days ago • 102
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 23 days ago • 102
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 23 days ago • 102
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving Paper • 2505.20665 • Published May 27, 2025 • 2
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving Paper • 2505.20665 • Published May 27, 2025 • 2
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46