SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents Paper • 2512.22322 • Published 14 days ago • 38
AlphaResearch: Accelerating New Algorithm Discovery with Language Models Paper • 2511.08522 • Published Nov 11, 2025 • 17
AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser Paper • 2511.16397 • Published Nov 20, 2025 • 8
RefineBench: Evaluating Refinement Capability of Language Models via Checklists Paper • 2511.22173 • Published Nov 27, 2025 • 14
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning Paper • 2512.02425 • Published Dec 2, 2025 • 24
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published Dec 2, 2025 • 32
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published Dec 8, 2025 • 36
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published 22 days ago • 111
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity Paper • 2511.15593 • Published Nov 19, 2025 • 57
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published Oct 27, 2025 • 22
LimRank: Less is More for Reasoning-Intensive Information Reranking Paper • 2510.23544 • Published Oct 27, 2025 • 8
E^2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker Paper • 2510.22733 • Published Oct 26, 2025 • 31
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17, 2025 • 5
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7, 2025 • 106
Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research Paper • 2510.06056 • Published Oct 7, 2025 • 5
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models Paper • 2510.08559 • Published Oct 9, 2025 • 8