Introduction

Model Description

We present MUA-RL Model, a multi-turn user-interacting agent reinforcement learning model designed for agentic tool use. This model is specifically designed for multi-turn conversation scenarios where agents need to maintain context across conversations while effectively utilizing tools to complete complex tasks. MUA-RL is the first framework to integrate LLM-simulated users into the reinforcement learning loop for agentic tool use, enabling autonomous learning of models to communicate with users efficiently and use various tools to solve practical problems in dynamic multi-turn interactions.

Performance

MUA-RL achieves competitive performance across multiple multi-turn tool-using benchmarks:

Model TAU2 Retail TAU2 Airline TAU2 Telecom BFCL-V3 Multi Turn ACEBench Agent
GPT-4.1 70.2 53.0 38.9 40.5 86.7
DeepSeek-V3-0324 64.7 37.0 32.9 29.8 74.2
Qwen3-32B-A22B Non-thinking 64.9 36.0 24.6 30.0 71.7
MUA-RL-32B Non-thinking 67.3 45.4 28.3 28.4 82.5
Qwen3-32B Non-thinking 50.2 23.5 24.8 19.6 72.5
MUA-RL-14B Non-thinking 66.0 38.0 33.4 25.3 78.3
Qwen3-14B Non-thinking 43.1 14.8 29.9 17.6 60.0
MUA-RL-8B Non-thinking 49.8 19.0 21.8 14.6 53.3
Qwen3-8B Non-thinking 41.0 12.5 19.1 11.8 39.2

The model outperforms or matches the performance of larger open-source models such as DeepSeek-V3-0324 and Qwen3-235B-A22B in non-thinking settings.

Training Details

Architecture

  • Model Size: 8B parameters
  • Train Context Length: 32K tokens

Training Process

  • Reinforcement Learning: Group Relative Policy Optimization (GRPO)
  • User Simulation: LLM-simulated users integrated into RL loop (GPT-4o-2024-11-20 as user)
  • Environment Management: Environment creation for each rollout
  • Tool Integration: Seamless tool calling and response handling

Citation

If you use MUA-RL in your research, please cite our paper:

@misc{zhao2025mua,
  title={MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for Agentic Tool Use},
  author={Weikang Zhao and Xili Wang and Chengdi Ma and Lingbin Kong and Zhaohua Yang and Mingxiang Tuo and Xiaowei Shi and Yitao Zhai and Xunliang Cai},
  year={2025},
  eprint={2508.18669},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2508.18669}
}
Downloads last month
4
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for zzwkk/MUA-RL-8B

Base model

Qwen/Qwen3-32B
Finetuned
(136)
this model
Quantizations
2 models