katanemo/Plano-Orchestrator-30B-A3B

Overview

Plano-Orchestrator is a family of state-of-the-art routing and orchestration models that decide which agent(s) or LLM(s) should handle each request, and in what sequence. Built for multi-agent orchestration systems, Plano-Orchestrator excels at analyzing user intent and conversation context to make precise routing and orchestration decisions. Designed for real-world deployments, it delivers strong performance across general conversations, coding tasks, and long-context multi-turn conversations, while remaining efficient enough for low-latency production environments.

Key capabilities

  • Multi-turn Context Understanding: Makes routing decisions based on full conversation history, maintaining contextual awareness across extended dialogues with evolving user needs.
  • Multi-intent Detection: Identifies when a single user message requires multiple agents simultaneously, enabling parallel/sequential routing to fulfill complex requests.
  • Context-dependent Routing: Correctly interprets ambiguous or referential messages by leveraging prior conversation context for accurate routing decisions.
  • Conversational Flow Handling: Understands diverse interaction patterns including follow-ups, clarifications, confirmations, and corrections within ongoing conversations.
  • Negative Case Detection: Recognizes when no specialized routing is needed, avoiding unnecessary LLM or agent calls for casual conversation.

Benchmark

We evaluate on 1,958 user messages across 605 multi-turn conversations with more than 130 different agents, covering three scenarios:

  • General (1,438 messages): Everyday conversational queries spanning diverse topics and agent types
  • Coding (285 messages): Development-focused conversations including debugging, code generation, and technical assistance
  • Long-context (235 messages): Extended conversations requiring understanding of extensive prior context

Each message is annotated with routing-relevant attributes, including not limited to intent multiplicity, context dependency, and continuation type. Below is the evaluation result.

For evaluation, please note that all models were evaluated with minimal reasoning to ensure routing remains efficient.

Example

import json
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM


ORCHESTRATION_PROMPT = (
    "You are a helpful assistant that selects the most suitable routes based on user intent.\n"
    "You are provided with a list of available routes enclosed within <routes></routes> XML tags:\n"
    "<routes>\n{routes}\n</routes>\n\n"
    "You are also given the conversation context enclosed within <conversation></conversation> XML tags:\n"
    "<conversation>\n{conversation}\n</conversation>\n\n"
    "## Instructions\n"
    "1. Analyze the latest user intent from the conversation.\n"
    "2. Compare it against the available routes to find which routes can help fulfill the request.\n"
    "3. Respond only with the exact route names from <routes>.\n"
    "4. If no routes can help or the intent is already fulfilled, return an empty list.\n\n"
    "## Response Format\n"
    "Return your answer strictly in JSON as follows:\n"
    '{{"route": ["route_name_1", "route_name_2", "..."]}}\n'
    "If no routes are needed, return an empty list for `route`."
)

def convert_agents_to_routes(agents):
    tools = [
        {
            "name": agent["name"],
            "description": agent["description"],
        }
        for agent in agents
    ]
    return "\n".join([json.dumps(tool, ensure_ascii=False) for tool in tools])

def build_messages(available_agents, conversation):
    routes = convert_agents_to_routes(available_agents)
    conversation_str = json.dumps(conversation, indent=4, ensure_ascii=False)
    prompt = ORCHESTRATION_PROMPT.format(routes=routes, conversation=conversation_str)
    return [{"role": "user", "content": prompt}]

# Load model
model_name = "katanemo/Plano-Orchestrator-30B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Define available agents
available_agents = [
    {"name": "WeatherAgent", "description": "Provides weather forecasts and current conditions for any location"},
    {"name": "CodeAgent", "description": "Generates, debugs, explains, and reviews code in multiple programming languages"}
]

# Conversation history
conversation = [
    {"role": "user", "content": "What's the weather like today?"},
    {"role": "assistant", "content": "I can help you with that. Could you tell me your location?"},
    {"role": "user", "content": "San Francisco"},
]

# Build messages and generate
model_inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
).to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
# Output: {"route": ["WeatherAgent"]}

License

The Plano-Orchestrator collection is distributed under the Katanemo license.

Downloads last month
27
Safetensors
Model size
31B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for katanemo/Plano-Orchestrator-30B-A3B

Finetuned
(28)
this model
Finetunes
1 model
Quantizations
1 model

Collection including katanemo/Plano-Orchestrator-30B-A3B