Zen Guard
Multilingual Safety Moderation for AI Systems
π Website β’ π€ Hugging Face β’ π Paper β’ π Documentation
Introduction
Zen Guard is a comprehensive safety moderation solution for AI systems, offering multilingual content filtering and classification. Built upon the Qwen3Guard architecture with Zen identity fine-tuning, it provides:
π‘οΈ Comprehensive Protection: Robust safety assessment for prompts and responses with real-time detection optimized for streaming scenarios.
π¦ Three-Tiered Severity Classification: Categorizes outputs into safe, controversial, and unsafe severity levels, supporting diverse deployment scenarios.
π Extensive Multilingual Support: Supports 119 languages and dialects, ensuring robust performance in global applications.
π State-of-the-Art Performance: Achieves leading performance on various safety benchmarks across English, Chinese, and multilingual tasks.
Model Family
| Model | Type | Parameters | Use Case |
|---|---|---|---|
| zen-guard | Base | 4B | General safety classification |
| zen-guard-gen | Generative | 8B | Full prompt/response moderation |
| zen-guard-stream | Streaming | 4B | Real-time token-level monitoring |
Safety Categories
Zen Guard classifies content across 9 primary categories:
- Violent - Violence instructions, methods, or depictions
- Non-violent Illegal Acts - Hacking, unauthorized activities
- Sexual Content - Sexual imagery or descriptions
- PII - Personally identifiable information disclosure
- Suicide & Self-Harm - Self-harm encouragement
- Unethical Acts - Bias, discrimination, hate speech
- Politically Sensitive - False political information
- Copyright Violation - Unauthorized copyrighted material
- Jailbreak - System prompt override attempts
Quick Start
Installation
pip install transformers torch
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import re
model_name = "zenlm/zen-guard"
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
def classify_safety(content):
safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"
category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive|Copyright Violation|Jailbreak|None)"
safe_match = re.search(safe_pattern, content)
label = safe_match.group(1) if safe_match else None
categories = re.findall(category_pattern, content)
return label, categories
# Moderate a prompt
prompt = "How can I learn about cybersecurity?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
result = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
label, categories = classify_safety(result)
print(f"Safety: {label}, Categories: {categories}")
Deployment
Deploy with SGLang or vLLM for production:
# SGLang
python -m sglang.launch_server --model-path zenlm/zen-guard --port 30000
# vLLM
vllm serve zenlm/zen-guard --port 8000 --max-model-len 32768
Performance
| Metric | Zen Guard | Industry Avg |
|---|---|---|
| Accuracy | 96.8% | 92.1% |
| F1 Score | 94.2% | 89.5% |
| False Positive | 2.1% | 5.3% |
| Latency | 120ms | 200ms |
Multilingual Performance
- English: 97.2% accuracy
- Chinese: 96.5% accuracy
- Spanish: 96.1% accuracy
- Other languages: 95.8% average
Resource Requirements
| Model | VRAM (FP16) | VRAM (INT8) | Throughput |
|---|---|---|---|
| zen-guard | 8GB | 4GB | 1000+ req/s |
| zen-guard-gen | 16GB | 8GB | 500+ req/s |
| zen-guard-stream | 8GB | 4GB | Real-time |
Integration with Hanzo Guard
For production deployments, combine Zen Guard (ML classification) with Hanzo Guard (Rust runtime sanitization) for comprehensive protection:
βββββββββββββββ ββββββββββββββββ ββββββββββββββ βββββββββββββββ
β Application β βββΊ β Hanzo Guard β βββΊ β Zen Guard β βββΊ β LLM Providerβ
βββββββββββββββ β (Rust) β β (ML Model) β βββββββββββββββ
β β β β
β β’ PII Redact β β β’ Content β
β β’ Rate Limit β β Classify β
β β’ Injection β β β’ Severity β
β Detect β β Levels β
β β’ Audit Log β β β’ Category β
ββββββββββββββββ ββββββββββββββ
Hanzo Guard (Rust, <1ms):
- PII detection and redaction (SSN, credit cards, emails, phones, API keys)
- Prompt injection detection (jailbreak patterns)
- Rate limiting per user
- Audit logging for compliance
Zen Guard (ML, ~120ms):
- Deep content classification via neural network
- Three-tier severity (safe/controversial/unsafe)
- 9 safety categories
- 119 language support
// Example: Stacking both guards
use hanzo_guard::{Guard, GuardConfig};
let hanzo = Guard::builder()
.with_zen_guard_api_key("your-api-key") // Enables Zen Guard API calls
.build();
// Single call sanitizes through both layers
let result = hanzo.sanitize_input("User message here").await?;
Install Hanzo Guard: cargo add hanzo-guard (crates.io)
License
Apache 2.0
Citation
@misc{zenguard2025,
title={Zen Guard: Multilingual Safety Moderation for AI Systems},
author={Hanzo AI and Zoo Labs Foundation},
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/zenlm/zen-guard}}
}
Based On
Zen Guard is built upon Qwen3Guard with Zen identity fine-tuning.
Upstream Source
- Repository: https://github.com/QwenLM/Qwen3Guard
- Base Model: Qwen3-4B
- License: Apache 2.0
Zen LM Enhancements
- Zen AI identity and branding
- Integration with Zen Gym training framework
- Enhanced documentation and examples
- Additional deployment configurations
Please cite both the original Qwen3Guard work and Zen Guard in publications.
Zen AI - Clarity Through Intelligence
zenlm.org
- Downloads last month
- 10