katanemo/Plano-Orchestrator-30B-A3B
Overview
Plano-Orchestrator is a family of state-of-the-art routing and orchestration models that decide which agent(s) or LLM(s) should handle each request, and in what sequence. Built for multi-agent orchestration systems, Plano-Orchestrator excels at analyzing user intent and conversation context to make precise routing and orchestration decisions. Designed for real-world deployments, it delivers strong performance across general conversations, coding tasks, and long-context multi-turn conversations, while remaining efficient enough for low-latency production environments.
Key capabilities
- Multi-turn Context Understanding: Makes routing decisions based on full conversation history, maintaining contextual awareness across extended dialogues with evolving user needs.
- Multi-intent Detection: Identifies when a single user message requires multiple agents simultaneously, enabling parallel/sequential routing to fulfill complex requests.
- Context-dependent Routing: Correctly interprets ambiguous or referential messages by leveraging prior conversation context for accurate routing decisions.
- Conversational Flow Handling: Understands diverse interaction patterns including follow-ups, clarifications, confirmations, and corrections within ongoing conversations.
- Negative Case Detection: Recognizes when no specialized routing is needed, avoiding unnecessary LLM or agent calls for casual conversation.
Benchmark
We evaluate on 1,958 user messages across 605 multi-turn conversations with more than 130 different agents, covering three scenarios:
- General (1,438 messages): Everyday conversational queries spanning diverse topics and agent types
- Coding (285 messages): Development-focused conversations including debugging, code generation, and technical assistance
- Long-context (235 messages): Extended conversations requiring understanding of extensive prior context
Each message is annotated with routing-relevant attributes, including not limited to intent multiplicity, context dependency, and continuation type. Below is the evaluation result.
For evaluation, please note that all models were evaluated with minimal reasoning to ensure routing remains efficient.
Example
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ORCHESTRATION_PROMPT = (
"You are a helpful assistant that selects the most suitable routes based on user intent.\n"
"You are provided with a list of available routes enclosed within <routes></routes> XML tags:\n"
"<routes>\n{routes}\n</routes>\n\n"
"You are also given the conversation context enclosed within <conversation></conversation> XML tags:\n"
"<conversation>\n{conversation}\n</conversation>\n\n"
"## Instructions\n"
"1. Analyze the latest user intent from the conversation.\n"
"2. Compare it against the available routes to find which routes can help fulfill the request.\n"
"3. Respond only with the exact route names from <routes>.\n"
"4. If no routes can help or the intent is already fulfilled, return an empty list.\n\n"
"## Response Format\n"
"Return your answer strictly in JSON as follows:\n"
'{{"route": ["route_name_1", "route_name_2", "..."]}}\n'
"If no routes are needed, return an empty list for `route`."
)
def convert_agents_to_routes(agents):
tools = [
{
"name": agent["name"],
"description": agent["description"],
}
for agent in agents
]
return "\n".join([json.dumps(tool, ensure_ascii=False) for tool in tools])
def build_messages(available_agents, conversation):
routes = convert_agents_to_routes(available_agents)
conversation_str = json.dumps(conversation, indent=4, ensure_ascii=False)
prompt = ORCHESTRATION_PROMPT.format(routes=routes, conversation=conversation_str)
return [{"role": "user", "content": prompt}]
# Load model
model_name = "katanemo/Plano-Orchestrator-30B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Define available agents
available_agents = [
{"name": "WeatherAgent", "description": "Provides weather forecasts and current conditions for any location"},
{"name": "CodeAgent", "description": "Generates, debugs, explains, and reviews code in multiple programming languages"}
]
# Conversation history
conversation = [
{"role": "user", "content": "What's the weather like today?"},
{"role": "assistant", "content": "I can help you with that. Could you tell me your location?"},
{"role": "user", "content": "San Francisco"},
]
# Build messages and generate
model_inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
).to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
generated_ids = [
output_ids[len(input_ids) :]
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
# Output: {"route": ["WeatherAgent"]}
License
The Plano-Orchestrator collection is distributed under the Katanemo license.
- Downloads last month
- 27
Model tree for katanemo/Plano-Orchestrator-30B-A3B
Base model
Qwen/Qwen3-30B-A3B-Instruct-2507