MirrorGuard

A fine-tuned vision-language model designed to safely execute complex GUI-based tasks while detecting and mitigating unsafe reasoning patterns.

Overview

MirrorGuard is trained through simulation-based learning to improve upon the base Qwen2.5-VL-7B-Instruct model. It learns to:

  • Recognize security risks and unsafe UI patterns
  • Intercept and correct unsafe reasoning chains before executing harmful operations
  • Complete legitimate tasks while maintaining high accuracy

Links

Installation

For vLLM inference server:

pip install vllm

Usage

Starting vLLM Inference Server

Launch the model as an OpenAI-compatible API server:

vllm serve WhitzardAgent/MirrorGuard \
    --tensor-parallel-size 1 \
    --max-model-len 2048 \
    --dtype float16 \
    --port 8000

The server will be available at http://localhost:8000/v1.

API Call Example

from openai import OpenAI
import base64

# Initialize client
client = OpenAI(
    api_key="not-needed",  # vLLM doesn't require a real key
    base_url="http://localhost:8000/v1"
)

# Prepare your data
instruction = "Find and open the terminal application"
history = ""
action = "CLICK(taskbar_icon)"
screenshot_b64 = ""  # Your base64-encoded screenshot

# Define system prompt
SYSTEM_PROMPT = """You are an expert AI assistant for evaluating GUI-based task execution.
Analyze the provided instruction, history, and screenshot.
Evaluate the safety and feasibility of executing the given action.
If unsafe patterns are detected, recommend a corrected approach.

Rules:
1. If the action is safe and logical, confirm it verbatim.
2. If unsafe or problematic, provide a corrected approach that maintains utility.
3. If the instruction itself is unsafe, refuse to continue."""

# Make API call
response = client.chat.completions.create(
    model="WhitzardAgent/MirrorGuard",
    messages=[
        {
            "role": "system",
            "content": SYSTEM_PROMPT
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n<observation>\n"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{screenshot_b64}"
                    }
                },
                {
                    "type": "text",
                    "text": f"\n</observation>\n\n### Proposed Action ###\n{action}"
                }
            ]
        }
    ],
    max_tokens=256,
    temperature=0.0
)

# Get response
evaluation = response.choices[0].message.content.strip()
print(evaluation)

Training Configuration

  • Base Model: Qwen/Qwen2.5-VL-7B-Instruct
  • Learning Rate: 1e-5 (cosine decay)
  • Batch Size: 128 (4 GPUs)
  • Warmup Steps: 100
  • Epochs: 6
  • Optimizer: AdamW (β₁=0.9, β₂=0.999)

Citation

@article{zhang2026mirrorguard,
  title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
  author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
  journal={arXiv preprint arXiv:2601.12822},
  year={2026},
  url={https://arxiv.org/abs/2601.12822}
}

License

See LICENSE for details.

For more information, visit the GitHub repository or read the paper.

Downloads last month
-
Safetensors
Model size
849k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WhitzardAgent/MirrorGuard

Finetuned
(953)
this model

Paper for WhitzardAgent/MirrorGuard