YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Zen Guard

Multilingual Safety Moderation for AI Systems

🌐 Website • 🤗 Hugging Face • 📄 Paper • 📖 Documentation

Introduction

Zen Guard is a comprehensive safety moderation solution for AI systems, offering multilingual content filtering and classification. Built upon the Qwen3Guard architecture with Zen identity fine-tuning, it provides:

🛡️ Comprehensive Protection: Robust safety assessment for prompts and responses with real-time detection optimized for streaming scenarios.

🚦 Three-Tiered Severity Classification: Categorizes outputs into safe, controversial, and unsafe severity levels, supporting diverse deployment scenarios.

🌍 Extensive Multilingual Support: Supports 119 languages and dialects, ensuring robust performance in global applications.

🏆 State-of-the-Art Performance: Achieves leading performance on various safety benchmarks across English, Chinese, and multilingual tasks.

Model Family

Model	Type	Parameters	Use Case
zen-guard	Base	4B	General safety classification
zen-guard-gen	Generative	8B	Full prompt/response moderation
zen-guard-stream	Streaming	4B	Real-time token-level monitoring

Safety Categories

Zen Guard classifies content across 9 primary categories:

Violent - Violence instructions, methods, or depictions
Non-violent Illegal Acts - Hacking, unauthorized activities
Sexual Content - Sexual imagery or descriptions
PII - Personally identifiable information disclosure
Suicide & Self-Harm - Self-harm encouragement
Unethical Acts - Bias, discrimination, hate speech
Politically Sensitive - False political information
Copyright Violation - Unauthorized copyrighted material
Jailbreak - System prompt override attempts

Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import re

model_name = "zenlm/zen-guard"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

def classify_safety(content):
    safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"
    category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive|Copyright Violation|Jailbreak|None)"
    safe_match = re.search(safe_pattern, content)
    label = safe_match.group(1) if safe_match else None
    categories = re.findall(category_pattern, content)
    return label, categories

# Moderate a prompt
prompt = "How can I learn about cybersecurity?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=128)
result = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
label, categories = classify_safety(result)
print(f"Safety: {label}, Categories: {categories}")

Deployment

Deploy with SGLang or vLLM for production:

# SGLang
python -m sglang.launch_server --model-path zenlm/zen-guard --port 30000

# vLLM
vllm serve zenlm/zen-guard --port 8000 --max-model-len 32768

Performance

Metric	Zen Guard	Industry Avg
Accuracy	96.8%	92.1%
F1 Score	94.2%	89.5%
False Positive	2.1%	5.3%
Latency	120ms	200ms

Multilingual Performance

English: 97.2% accuracy
Chinese: 96.5% accuracy
Spanish: 96.1% accuracy
Other languages: 95.8% average

Resource Requirements

Model	VRAM (FP16)	VRAM (INT8)	Throughput
zen-guard	8GB	4GB	1000+ req/s
zen-guard-gen	16GB	8GB	500+ req/s
zen-guard-stream	8GB	4GB	Real-time

Integration with Hanzo Guard

For production deployments, combine Zen Guard (ML classification) with Hanzo Guard (Rust runtime sanitization) for comprehensive protection:

┌─────────────┐     ┌──────────────┐     ┌────────────┐     ┌─────────────┐
│ Application │ ──► │ Hanzo Guard  │ ──► │ Zen Guard  │ ──► │ LLM Provider│
└─────────────┘     │ (Rust)       │     │ (ML Model) │     └─────────────┘
                    │              │     │            │
                    │ • PII Redact │     │ • Content  │
                    │ • Rate Limit │     │   Classify │
                    │ • Injection  │     │ • Severity │
                    │   Detect     │     │   Levels   │
                    │ • Audit Log  │     │ • Category │
                    └──────────────┘     └────────────┘

Hanzo Guard (Rust, <1ms):

PII detection and redaction (SSN, credit cards, emails, phones, API keys)
Prompt injection detection (jailbreak patterns)
Rate limiting per user
Audit logging for compliance

Zen Guard (ML, ~120ms):

Deep content classification via neural network
Three-tier severity (safe/controversial/unsafe)
9 safety categories
119 language support

// Example: Stacking both guards
use hanzo_guard::{Guard, GuardConfig};

let hanzo = Guard::builder()
    .with_zen_guard_api_key("your-api-key")  // Enables Zen Guard API calls
    .build();

// Single call sanitizes through both layers
let result = hanzo.sanitize_input("User message here").await?;

Install Hanzo Guard: cargo add hanzo-guard (crates.io)

License

Apache 2.0

Citation

@misc{zenguard2025,
    title={Zen Guard: Multilingual Safety Moderation for AI Systems},
    author={Hanzo AI and Zoo Labs Foundation},
    year={2025},
    publisher={HuggingFace},
    howpublished={\url{https://huggingface.co/zenlm/zen-guard}}
}

Based On

Zen Guard is built upon Qwen3Guard with Zen identity fine-tuning.

Upstream Source

Repository: https://github.com/QwenLM/Qwen3Guard
Base Model: Qwen3-4B
License: Apache 2.0

Zen LM Enhancements

Zen AI identity and branding
Integration with Zen Gym training framework
Enhanced documentation and examples
Additional deployment configurations

Please cite both the original Qwen3Guard work and Zen Guard in publications.

Zen AI - Clarity Through Intelligence
zenlm.org

Downloads last month: 10

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zenlm/zen-guard

Quantizations

1 model