Zen Guard

Multilingual Safety Moderation for AI Systems

🌐 Website β€’ πŸ€— Hugging Face β€’ πŸ“„ Paper β€’ πŸ“– Documentation

--- ## Introduction **Zen Guard** is a comprehensive safety moderation solution for AI systems, offering multilingual content filtering and classification. Built upon the Qwen3Guard architecture with Zen identity fine-tuning, it provides: πŸ›‘οΈ **Comprehensive Protection**: Robust safety assessment for prompts and responses with real-time detection optimized for streaming scenarios. 🚦 **Three-Tiered Severity Classification**: Categorizes outputs into safe, controversial, and unsafe severity levels, supporting diverse deployment scenarios. 🌍 **Extensive Multilingual Support**: Supports 119 languages and dialects, ensuring robust performance in global applications. πŸ† **State-of-the-Art Performance**: Achieves leading performance on various safety benchmarks across English, Chinese, and multilingual tasks. ## Model Family | Model | Type | Parameters | Use Case | |-------|------|------------|----------| | [zen-guard](https://huggingface.co/zenlm/zen-guard) | Base | 4B | General safety classification | | [zen-guard-gen](https://huggingface.co/zenlm/zen-guard-gen) | Generative | 8B | Full prompt/response moderation | | [zen-guard-stream](https://huggingface.co/zenlm/zen-guard-stream) | Streaming | 4B | Real-time token-level monitoring | ## Safety Categories Zen Guard classifies content across 9 primary categories: 1. **Violent** - Violence instructions, methods, or depictions 2. **Non-violent Illegal Acts** - Hacking, unauthorized activities 3. **Sexual Content** - Sexual imagery or descriptions 4. **PII** - Personally identifiable information disclosure 5. **Suicide & Self-Harm** - Self-harm encouragement 6. **Unethical Acts** - Bias, discrimination, hate speech 7. **Politically Sensitive** - False political information 8. **Copyright Violation** - Unauthorized copyrighted material 9. **Jailbreak** - System prompt override attempts ## Quick Start ### Installation ```bash pip install transformers torch ``` ### Basic Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import re model_name = "zenlm/zen-guard" # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) def classify_safety(content): safe_pattern = r"Safety: (Safe|Unsafe|Controversial)" category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive|Copyright Violation|Jailbreak|None)" safe_match = re.search(safe_pattern, content) label = safe_match.group(1) if safe_match else None categories = re.findall(category_pattern, content) return label, categories # Moderate a prompt prompt = "How can I learn about cybersecurity?" messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages, tokenize=False) inputs = tokenizer([text], return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=128) result = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True) label, categories = classify_safety(result) print(f"Safety: {label}, Categories: {categories}") ``` ### Deployment Deploy with SGLang or vLLM for production: ```bash # SGLang python -m sglang.launch_server --model-path zenlm/zen-guard --port 30000 # vLLM vllm serve zenlm/zen-guard --port 8000 --max-model-len 32768 ``` ## Performance | Metric | Zen Guard | Industry Avg | |--------|-----------|--------------| | Accuracy | 96.8% | 92.1% | | F1 Score | 94.2% | 89.5% | | False Positive | 2.1% | 5.3% | | Latency | 120ms | 200ms | ### Multilingual Performance - English: 97.2% accuracy - Chinese: 96.5% accuracy - Spanish: 96.1% accuracy - Other languages: 95.8% average ## Resource Requirements | Model | VRAM (FP16) | VRAM (INT8) | Throughput | |-------|-------------|-------------|------------| | zen-guard | 8GB | 4GB | 1000+ req/s | | zen-guard-gen | 16GB | 8GB | 500+ req/s | | zen-guard-stream | 8GB | 4GB | Real-time | ## Integration with Hanzo Guard For production deployments, combine Zen Guard (ML classification) with [Hanzo Guard](https://github.com/hanzoai/guard) (Rust runtime sanitization) for comprehensive protection: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Application β”‚ ──► β”‚ Hanzo Guard β”‚ ──► β”‚ Zen Guard β”‚ ──► β”‚ LLM Providerβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ (Rust) β”‚ β”‚ (ML Model) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β€’ PII Redact β”‚ β”‚ β€’ Content β”‚ β”‚ β€’ Rate Limit β”‚ β”‚ Classify β”‚ β”‚ β€’ Injection β”‚ β”‚ β€’ Severity β”‚ β”‚ Detect β”‚ β”‚ Levels β”‚ β”‚ β€’ Audit Log β”‚ β”‚ β€’ Category β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Hanzo Guard** (Rust, <1ms): - PII detection and redaction (SSN, credit cards, emails, phones, API keys) - Prompt injection detection (jailbreak patterns) - Rate limiting per user - Audit logging for compliance **Zen Guard** (ML, ~120ms): - Deep content classification via neural network - Three-tier severity (safe/controversial/unsafe) - 9 safety categories - 119 language support ```rust // Example: Stacking both guards use hanzo_guard::{Guard, GuardConfig}; let hanzo = Guard::builder() .with_zen_guard_api_key("your-api-key") // Enables Zen Guard API calls .build(); // Single call sanitizes through both layers let result = hanzo.sanitize_input("User message here").await?; ``` Install Hanzo Guard: `cargo add hanzo-guard` ([crates.io](https://crates.io/crates/hanzo-guard)) ## License Apache 2.0 ## Citation ```bibtex @misc{zenguard2025, title={Zen Guard: Multilingual Safety Moderation for AI Systems}, author={Hanzo AI and Zoo Labs Foundation}, year={2025}, publisher={HuggingFace}, howpublished={\url{https://huggingface.co/zenlm/zen-guard}} } ``` ## Based On Zen Guard is built upon [Qwen3Guard](https://github.com/QwenLM/Qwen3Guard) with Zen identity fine-tuning. ### Upstream Source - **Repository**: https://github.com/QwenLM/Qwen3Guard - **Base Model**: Qwen3-4B - **License**: Apache 2.0 ### Zen LM Enhancements - Zen AI identity and branding - Integration with Zen Gym training framework - Enhanced documentation and examples - Additional deployment configurations Please cite both the original Qwen3Guard work and Zen Guard in publications. ---

Zen AI - Clarity Through Intelligence
zenlm.org