-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 132 -
Magistral
Paper • 2506.10910 • Published • 65 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
Collections
Discover the best community collections!
Collections including paper arxiv:2508.16153
-
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Paper • 2504.08791 • Published • 137 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126
-
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 102 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 58 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 155 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 154
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 28 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 41 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Paper • 2506.04180 • Published • 33 -
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Paper • 2506.10540 • Published • 37 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19 -
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search
Paper • 2507.15245 • Published • 11
-
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
Paper • 2410.22304 • Published • 18 -
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Paper • 2410.19609 • Published • 18 -
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Paper • 2411.00412 • Published • 10 -
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Paper • 2410.02052 • Published • 9
-
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 47 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 92 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 83 -
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Paper • 2409.10173 • Published • 34
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 132 -
Magistral
Paper • 2506.10910 • Published • 65 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
-
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Paper • 2506.04180 • Published • 33 -
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Paper • 2506.10540 • Published • 37 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19 -
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search
Paper • 2507.15245 • Published • 11
-
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Paper • 2504.08791 • Published • 137 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126
-
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 102 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 58 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 155 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 154
-
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
Paper • 2410.22304 • Published • 18 -
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Paper • 2410.19609 • Published • 18 -
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Paper • 2411.00412 • Published • 10 -
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Paper • 2410.02052 • Published • 9
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 28 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 41 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 47 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 92 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 83 -
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Paper • 2409.10173 • Published • 34