papers - a passagereptile455 Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

passagereptile455 's Collections

papers

papers

updated 10 days ago

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18 • 4
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 141
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19 • 88
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21 • 64
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24 • 40
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25 • 30
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Paper • 2507.18392 • Published Jul 24 • 19
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

Paper • 2507.17596 • Published Jul 23 • 5
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement

Paper • 2507.18742 • Published Jul 24 • 5
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI

Paper • 2507.10510 • Published Jul 14 • 4
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Paper • 2507.19457 • Published Jul 25 • 28
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Paper • 2507.16534 • Published Jul 22 • 7
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 258
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 237
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 307
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 157
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 154
Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2 • 130
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 132
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22 • 120
T-LoRA: Single Image Diffusion Model Customization Without Overfitting

Paper • 2507.05964 • Published Jul 8 • 118
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Paper • 2507.14683 • Published Jul 19 • 131
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Paper • 2410.10813 • Published Oct 14, 2024 • 12
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Paper • 2506.11928 • Published Jun 13 • 23
Defeating Prompt Injections by Design

Paper • 2503.18813 • Published Mar 24 • 22
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

Paper • 2505.22954 • Published May 29 • 14
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis

Paper • 2505.11581 • Published May 16 • 3
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12, 2024 • 126
Evaluating Large Language Models Trained on Code

Paper • 2107.03374 • Published Jul 7, 2021 • 8
Self-Refine: Iterative Refinement with Self-Feedback

Paper • 2303.17651 • Published Mar 30, 2023 • 2
Gorilla: Large Language Model Connected with Massive APIs

Paper • 2305.15334 • Published May 24, 2023 • 5
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace

Paper • 2303.17580 • Published Mar 30, 2023 • 14
Communicative Agents for Software Development

Paper • 2307.07924 • Published Jul 16, 2023 • 6
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework

Paper • 2308.08155 • Published Aug 16, 2023 • 10
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Paper • 2509.09677 • Published Sep 11 • 34
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 101
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 185
Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published Apr 3 • 56
BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues

Paper • 2501.10836 • Published Jan 18 • 1
Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 177
DynaSaur: Large Language Agents Beyond Predefined Actions

Paper • 2411.01747 • Published Nov 4, 2024 • 37
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

Paper • 2401.00812 • Published Jan 1, 2024 • 11
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published 14 days ago • 24
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Paper • 2509.18058 • Published Sep 22 • 12
Speculative Safety-Aware Decoding

Paper • 2508.17739 • Published Aug 25
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs

Paper • 2508.10029 • Published Aug 8
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs

Paper • 2508.10031 • Published Aug 9
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs

Paper • 2508.20333 • Published Aug 28
Mitigating Jailbreaks with Intent-Aware LLMs

Paper • 2508.12072 • Published Aug 16
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models

Paper • 2509.17938 • Published Sep 22 • 3
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness

Paper • 2509.14297 • Published Sep 17
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 470
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Paper • 2412.21199 • Published Dec 30, 2024 • 14
Solving Inequality Proofs with Large Language Models

Paper • 2506.07927 • Published Jun 9 • 20

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs