arxiv:2601.18226

Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks

Published on Jan 26

· Submitted by

Haotian Li on Jan 27

Yunjue Technology

Upvote

Authors:

Haotian Li ,

Weizhen Qi ,

Abstract

Agents that evolve tools through continuous interaction and feedback can adapt to dynamic environments and transfer knowledge across domains more effectively than traditional systems.

AI-generated summary

Conventional agent systems often struggle in open-ended environments where task distributions continuously drift and external supervision is scarce. Their reliance on static toolsets or offline training lags behind these dynamics, leaving the system's capability boundaries rigid and unknown. To address this, we propose the In-Situ Self-Evolving paradigm. This approach treats sequential task interactions as a continuous stream of experience, enabling the system to distill short-term execution feedback into long-term, reusable capabilities without access to ground-truth labels. Within this framework, we identify tool evolution as the critical pathway for capability expansion, which provides verifiable, binary feedback signals. Within this framework, we develop Yunjue Agent, a system that iteratively synthesizes, optimizes, and reuses tools to navigate emerging challenges. To optimize evolutionary efficiency, we further introduce a Parallel Batch Evolution strategy. Empirical evaluations across five diverse benchmarks under a zero-start setting demonstrate significant performance gains over proprietary baselines. Additionally, complementary warm-start evaluations confirm that the accumulated general knowledge can be seamlessly transferred to novel domains. Finally, we propose a novel metric to monitor evolution convergence, serving as a function analogous to training loss in conventional optimization. We open-source our codebase, system traces, and evolved tools to facilitate future research in resilient, self-evolving intelligence.

View arXiv page View PDF Project page GitHub 150 Add to collection

Community

lirt1231

Paper author Paper submitter about 20 hours ago

•

edited about 20 hours ago

What is Yunjue Agent?

Conventional agent systems often struggle in open-ended environments where task distributions continuously drift and external supervision is scarce. Their reliance on static toolsets or offline training lags behind these dynamics, leaving the system's capability boundaries rigid and unknown. To address this, we propose the In-Situ Self-Evolving paradigm. This approach treats sequential task interactions as a continuous stream of experience, enabling the system to distill short-term execution feedback into long-term, reusable capabilities without access to ground-truth labels. Within this framework, we identify tool evolution as the critical pathway for capability expansion, which provides verifiable, binary feedback signals. Within this framework, we develop Yunjue Agent, a system that iteratively synthesizes, optimizes, and reuses tools to navigate emerging challenges. To optimize evolutionary efficiency, we further introduce a Parallel Batch Evolution strategy. Empirical evaluations across five diverse benchmarks under a zero-start setting demonstrate significant performance gains over proprietary baselines. Additionally, complementary warm-start evaluations confirm that the accumulated general knowledge can be seamlessly transferred to novel domains. Finally, we propose a novel metric to monitor evolution convergence, serving as a function analogous to training loss in conventional optimization. We open-source our codebase, system traces, and evolved tools to facilitate future research in resilient, self-evolving intelligence.

🌟 Highlights

🧬 In-situ Self-evolving Paradigm

We introduce a novel agentic learning framework that bridges the gap between static capability and on-the-fly evolving. By reframing discrete interactions as a continuous stream of experience, the system distills short-term inference into long-term capabilities via internal feedback loops. This enables real-time adaptation and exploration in open-ended environments without the need for additional supervision signals.
🚀 SOTA Performance from "Tabula Rasa"

Starting with an empty tool library (Zero-Start), our system achieves State-of-the-Art performance by relying solely on inference-time generation, verification, and induction. It demonstrates significant gains over backend models (e.g., +17.4% on DeepSearchQA over Gemini 3 Pro) and secures 2nd place on the HLE leaderboard, proving the feasibility of bootstrapping general capabilities from scratch.
🛠️ "Tool-First" Evolutionary Principle

We prioritize tool evolution over Memory or Workflows as the primary driver of capability. Tools provide objective Binary Feedback (via code execution success/failure), serving as a reliable internal supervision signal in the absence of human annotation. This approach mitigates hallucination risks and prevents strategy bias, ensuring stable accumulation of general primitives.
🔍 Fully Reproducible & Open Traces

We release a comprehensive open-asset suite, including end-to-end code, benchmark scripts, versioned tool artifacts, and full interaction traces. This transforms "black-box" agent results into transparent, auditable research, enabling granular analysis of tool convergence, evolution efficiency, and merging strategies.

📈 Performance on Benchmarks

We benchmark Yunjue Agent on a series of benchmarks, including HLE, DeepSearchQA, FinSearchComp (T2&T3), xbench-ScienceQA and xbench-DeepSearch, and achieved SOTA results.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.18226 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.18226 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.18226 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.