Papers
arxiv:2601.13572

Behavior Knowledge Merge in Reinforced Agentic Models

Published on Jan 20
ยท Submitted by
Xiangchi Yuan
on Jan 22
Authors:
,
,
,
,
,

Abstract

Reinforced Agent Merging (RAM) addresses the limitations of traditional merging methods for reinforcement learning-trained agents by distinguishing shared and task-specific parameters to preserve critical behaviors during model integration.

AI-generated summary

Reinforcement learning (RL) is central to post-training, particularly for agentic models that require specialized reasoning behaviors. In this setting, model merging offers a practical mechanism for integrating multiple RL-trained agents from different tasks into a single generalist model. However, existing merging methods are designed for supervised fine-tuning (SFT), and they are suboptimal to preserve task-specific capabilities on RL-trained agentic models. The root is a task-vector mismatch between RL and SFT: on-policy RL induces task vectors that are highly sparse and heterogeneous, whereas SFT-style merging implicitly assumes dense and globally comparable task vectors. When standard global averaging is applied under this mismatch, RL's non-overlapping task vectors that encode critical task-specific behaviors are reduced and parameter updates are diluted. To address this issue, we propose Reinforced Agent Merging (RAM), a distribution-aware merging framework explicitly designed for RL-trained agentic models. RAM disentangles shared and task-specific unique parameter updates, averaging shared components while selectively preserving and rescaling unique ones to counteract parameter update dilution. Experiments across multiple agent domains and model architectures demonstrate that RAM not only surpasses merging baselines, but also unlocks synergistic potential among agents to achieve performance superior to that of specialized agents in their domains.

Community

Paper author Paper submitter
โ€ข
edited about 21 hours ago

๐Ÿš€ TL;DR

We introduce RAM (Reinforced Agent Merging), a method designed to merge RL-trained agents into a single generalist model without retraining, outperforming the original specialized agents in their domains.

teaser

๐Ÿ’ก Key Insights

  • The Problem: Standard merging methods (like TIES/DARE) are built for SFT models. We find they fail for RL models because RL updates are extremely sparse and heterogeneous, leading to "Signal Dilution" when averaged (performance drops).
motivation1
motivation2
  • The Solution: RAM explicitly disentangles "shared" vs "unique" parameters. It preserves the full magnitude of unique task vectors to prevent dilution while averaging shared knowledge.
ram
  • Performance: RAM outperforms all existing merging baselines on agentic benchmarks (CURE, BFCL, MemAgent).
result1
result2

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.13572 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.13572 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.13572 in a Space README.md to link it from this page.

Collections including this paper 1