papers
updated
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper
•
2503.14734
•
Published
•
4
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost
Whole-Body Teleoperation
Paper
•
2401.02117
•
Published
•
33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient
Robotics
Paper
•
2506.01844
•
Published
•
141
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
•
2506.16035
•
Published
•
88
Deep Researcher with Test-Time Diffusion
Paper
•
2507.16075
•
Published
•
64
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane
Algorithm
Paper
•
2507.18553
•
Published
•
40
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
Agents
Paper
•
2507.19478
•
Published
•
30
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
Paper
•
2507.18392
•
Published
•
19
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
Paper
•
2507.17596
•
Published
•
5
Specification Self-Correction: Mitigating In-Context Reward Hacking
Through Test-Time Refinement
Paper
•
2507.18742
•
Published
•
5
Chat with AI: The Surprising Turn of Real-time Video Communication from
Human to AI
Paper
•
2507.10510
•
Published
•
4
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper
•
2507.19457
•
Published
•
28
Frontier AI Risk Management Framework in Practice: A Risk Analysis
Technical Report
Paper
•
2507.16534
•
Published
•
7
A Survey of Context Engineering for Large Language Models
Paper
•
2507.13334
•
Published
•
258
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
•
2507.01006
•
Published
•
237
Group Sequence Policy Optimization
Paper
•
2507.18071
•
Published
•
307
Scaling RL to Long Videos
Paper
•
2507.07966
•
Published
•
157
MemOS: A Memory OS for AI System
Paper
•
2507.03724
•
Published
•
154
Kwai Keye-VL Technical Report
Paper
•
2507.01949
•
Published
•
130
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper
•
2507.15846
•
Published
•
132
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
•
2507.16784
•
Published
•
120
T-LoRA: Single Image Diffusion Model Customization Without Overfitting
Paper
•
2507.05964
•
Published
•
118
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via
Context-Aware Multi-Stage Policy Optimization
Paper
•
2507.14683
•
Published
•
131
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive
Memory
Paper
•
2410.10813
•
Published
•
12
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive
Programming?
Paper
•
2506.11928
•
Published
•
23
Defeating Prompt Injections by Design
Paper
•
2503.18813
•
Published
•
22
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
Paper
•
2505.22954
•
Published
•
14
Questioning Representational Optimism in Deep Learning: The Fractured
Entangled Representation Hypothesis
Paper
•
2505.11581
•
Published
•
3
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
•
2408.06292
•
Published
•
126
Evaluating Large Language Models Trained on Code
Paper
•
2107.03374
•
Published
•
8
Self-Refine: Iterative Refinement with Self-Feedback
Paper
•
2303.17651
•
Published
•
2
Gorilla: Large Language Model Connected with Massive APIs
Paper
•
2305.15334
•
Published
•
5
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
Paper
•
2303.17580
•
Published
•
14
Communicative Agents for Software Development
Paper
•
2307.07924
•
Published
•
6
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Framework
Paper
•
2308.08155
•
Published
•
10
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in
LLMs
Paper
•
2509.09677
•
Published
•
34
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
•
2510.05592
•
Published
•
101
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
•
2505.03335
•
Published
•
185
Inference-Time Scaling for Generalist Reward Modeling
Paper
•
2504.02495
•
Published
•
56
BAP v2: An Enhanced Task Framework for Instruction Following in
Minecraft Dialogues
Paper
•
2501.10836
•
Published
•
1
Executable Code Actions Elicit Better LLM Agents
Paper
•
2402.01030
•
Published
•
177
DynaSaur: Large Language Agents Beyond Predefined Actions
Paper
•
2411.01747
•
Published
•
37
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents
Paper
•
2401.00812
•
Published
•
11
Agent Data Protocol: Unifying Datasets for Diverse, Effective
Fine-tuning of LLM Agents
Paper
•
2510.24702
•
Published
•
24
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM
Paper
•
2509.18058
•
Published
•
12
Speculative Safety-Aware Decoding
Paper
•
2508.17739
•
Published
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations
to Elicit Unsafe LLM Outputs
Paper
•
2508.10029
•
Published
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe
Alignment of LLMs
Paper
•
2508.10031
•
Published
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in
LLMs
Paper
•
2508.20333
•
Published
Mitigating Jailbreaks with Intent-Aware LLMs
Paper
•
2508.12072
•
Published
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language
Models
Paper
•
2509.17938
•
Published
•
3
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
Paper
•
2509.14297
•
Published
Less is More: Recursive Reasoning with Tiny Networks
Paper
•
2510.04871
•
Published
•
470
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on
Self-invoking Code Generation
Paper
•
2412.21199
•
Published
•
14
Solving Inequality Proofs with Large Language Models
Paper
•
2506.07927
•
Published
•
20