TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning Paper • 2510.06217 • Published Oct 7 • 63
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum Paper • 2510.00526 • Published Oct 1 • 8
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26 • 133
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use Paper • 2505.19255 • Published May 25 • 5
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6 • 73
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6 • 73
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6 • 73 • 2
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark Paper • 2406.06647 • Published Jun 10, 2024
Ask, and it shall be given: Turing completeness of prompting Paper • 2411.01992 • Published Nov 4, 2024
BACKTIME: Backdoor Attacks on Multivariate Time Series Forecasting Paper • 2410.02195 • Published Oct 3, 2024