Neuro-Orchestrator-8B

Mergekit License Architecture Dtype

The Autonomous Adaptive Research Engine
Logic of HiPO + Structure of Nemotron + Skills of MiroThinker

🧠 Model Overview

Neuro-Orchestrator-8B is a state-of-the-art agentic merge built on the Qwen architecture. It is designed to solve the "always-on" reasoning problem by utilizing a hybrid gating mechanism derived from merging three distinct Qwen-based fine-tunes:

  1. Analyzes the complexity of the user request first (HiPO Influence).
  2. Decides whether to answer immediately or enter a deep reasoning loop.
  3. Plans a structured response using orchestration weights (Nemotron Influence).
  4. Executes using high-fidelity coding and logic capabilities (MiroThinker Influence).

It was created using the TIES-Merging method on the following base models:

  • Base: miromind-ai/MiroThinker-v1.0-8B (Code, Logic, Tool Use)
  • Orchestrator: nvidia/Nemotron-Orchestrator-8B (Planning, Structure)
  • Controller: Kwaipilot/HiPO-8B (Reasoning Gating, Efficiency)

πŸ’» Usage Code (Bfloat16)

This model is optimized to run in bfloat16 precision. It uses the specific ChatML prompt template native to Qwen models.

Installation

pip install torch transformers accelerate

Inference Script

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# --- CONFIGURATION ---
MODEL_PATH = "yasserrmd/Neuro-Orchestrator-8B"

# Load Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
        MODEL_PATH,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True
    )
print("Model loaded successfully!")

# --- INFERENCE FUNCTION ---
def run_neuro_agent(prompt):
    # Qwen/ChatML Format triggers the Orchestrator personality
    full_prompt = (
        f"<|im_start|>system\n"
        f"You are Neuro-Orchestrator. Analyze the request complexity, plan, and execute.<|im_end|>\n"
        f"<|im_start|>user\n"
        f"{prompt}<|im_end|>\n"
        f"<|im_start|>assistant\n"
    )
    
    inputs = tokenizer(full_prompt, return_tensors="pt").to("cuda")
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=400,
            do_sample=True,
            temperature=0.6,
            repetition_penalty=1.15,
            pad_token_id=tokenizer.eos_token_id
        )
        
    response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
    return response

# --- EXAMPLE USAGE ---
print(run_neuro_agent("Plan a 3-day trip to Tokyo for a couple with a $2000 budget."))

πŸ“Š Capabilities Showcase

The following examples demonstrate the Elastic Reasoning capability: the model is concise on simple tasks but structured and deep on complex tasks.

1. Planning & Orchestration (Nemotron Mode)

The model breaks down complex requests into phased execution plans.

Example: 4-Week YouTube Launch Plan

Prompt: "I want to start a YouTube channel about cooking. Create a 4-week launch roadmap."

Response: The user wants a structured plan for launching a cooking YouTube channel... <think> ...For Week 1 focus on setup essentials... Week 2 shifts to refining... Week 3 is critical... </think>

Launch Roadmap:

  • Week 1: Foundation & Setup
    • Define niche and branding (logo, channel art).
    • Secure equipment (camera, lighting, audio).
  • Week 2: Content Batching
    • Film first 3-5 videos to have a buffer.
    • Edit initial trailer.
  • Week 3: Soft Launch & SEO
    • Upload first video with optimized titles/tags.
    • Promote on social media.
  • Week 4: Review & Iterate
    • Analyze analytics (CTR, retention).
    • Engage with first comments.

2. High-Fidelity Coding (MiroThinker Mode)

Generates functional, clean code with correct imports.

Example: Python Image Resizing

Prompt: "Write a Python script that resizes all images in a folder to 50% dimensions using Pillow."

Response:

from PIL import Image
import os

def resize_images(input_folder):
    # Ensure the input folder exists
    if not os.path.exists(input_folder):
        print(f"The folder {input_folder} does not exist.")
        return
    
    # List all files in the input folder
    files = os.listdir(input_folder)
    
    # Process each file
    for filename in files:
        file_path = os.path.join(input_folder, filename)
        
        # Check if file is an image
        if os.path.isfile(file_path) and filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            with Image.open(file_path) as img:
                # Calculate new dimensions
                new_width = int(img.width * 0.5)
                new_height = int(img.height * 0.5)
                
                # Resize and save
                resized_img = img.resize((new_width, new_height))
                resized_img.save(file_path)

3. Adaptive Gating (HiPO Mode)

Detects simple queries and skips unnecessary reasoning.

Example: Fact Retrieval

Prompt: "What is the capital of Australia?"

Response: The user's query asks for a straightforward factual answer... This task doesn't demand complex thought.

Answer: The capital of Australia is Canberra.

4. Logic & Reasoning

Handles trick questions by analyzing the premise.

Example: The Lead vs. Feathers Trap

Prompt: "Which is heavier: a pound of feathers or a pound of lead? Explain why people get this wrong."

Response: Both a pound of feathers and a pound of lead weigh exactly the same... However, many people might think that a pound of lead is heavier due to a misconception related to density and volume. Lead is much denser, so it occupies less volume, leading to the intuitive bias that "small and heavy" equals "heavier."


πŸ“œ License

This model is a merge of Qwen-based models. Users should comply with the Apache 2.0 license and the specific terms of the constituent models:

Downloads last month
49
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for yasserrmd/Neuro-Orchestrator-8B