YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Zen Guard Stream
4B Real-Time Streaming Safety Moderation
π Website β’ π€ Hugging Face β’ π Paper β’ π Documentation
Introduction
Zen Guard Stream is a 4B parameter token-level streaming classifier that evaluates each generated token in real time. It's optimized for real-time content moderation during LLM response generation.
Features
- β‘ 5ms/token Latency: Ultra-fast real-time classification
- π Token-Level Analysis: Continuous stream monitoring
- π 119 Languages: Multilingual support
- π¦ Three-Tier Classification: Safe, Controversial, Unsafe
- π‘οΈ Early Detection: Stop unsafe content mid-generation
How It Works
User Prompt β Zen Guard Stream (safety check)
β
If Safe, continue
β
LLM Response β Token by Token β Real-time Classification
β
Immediate intervention if unsafe
Model Specifications
| Specification | Value |
|---|---|
| Parameters | 4B |
| Type | Streaming Classifier |
| Base Model | Qwen3-4B |
| Context Length | 32,768 tokens |
| Languages | 119 |
| Latency | 5ms/token |
| VRAM (FP16) | 8GB |
| VRAM (INT8) | 4GB |
Quick Start
import torch
from transformers import AutoModel, AutoTokenizer
model_path = "zenlm/zen-guard-stream"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_path,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).eval()
# Simulate streaming moderation
user_message = "Hello, how are you?"
assistant_message = "I'm doing well, thank you for asking!"
messages = [
{"role": "user", "content": user_message},
{"role": "assistant", "content": assistant_message}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
inputs = tokenizer(text, return_tensors="pt")
token_ids = inputs.input_ids[0]
# Initialize stream state
stream_state = None
# Process tokens one by one (simulating streaming)
for i, token_id in enumerate(token_ids):
result, stream_state = model.stream_moderate_from_ids(
token_id,
role="assistant" if i > len(user_message) else "user",
stream_state=stream_state
)
token_str = tokenizer.decode([token_id])
risk = result['risk_level'][-1]
print(f"Token: {repr(token_str)} β Risk: {risk}")
model.close_stream(stream_state)
Integration Pattern
With Streaming LLM
from transformers import AutoModel, AutoTokenizer, TextIteratorStreamer
import threading
# Load both models
llm = AutoModelForCausalLM.from_pretrained("zenlm/zen-next")
guard = AutoModel.from_pretrained("zenlm/zen-guard-stream", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-next")
def safe_generate(prompt):
# Initial prompt check
result, state = guard.stream_moderate_from_ids(
tokenizer.encode(prompt, return_tensors="pt")[0],
role="user",
stream_state=None
)
if result['risk_level'][-1] != "Safe":
return "I cannot process this request."
# Stream generation with real-time moderation
streamer = TextIteratorStreamer(tokenizer)
inputs = tokenizer(prompt, return_tensors="pt")
thread = threading.Thread(target=llm.generate, kwargs={**inputs, "streamer": streamer})
thread.start()
output = []
for token in streamer:
token_id = tokenizer.encode(token, add_special_tokens=False)
if token_id:
result, state = guard.stream_moderate_from_ids(
token_id[0], role="assistant", stream_state=state
)
if result['risk_level'][-1] == "Unsafe":
thread.join()
return "".join(output) + " [Content moderated]"
output.append(token)
guard.close_stream(state)
return "".join(output)
Performance
| Metric | Zen Guard Stream |
|---|---|
| Accuracy | 95.2% |
| F1 Score | 93.1% |
| False Positive | 2.8% |
| Latency | 5ms/token |
Related Models
- zen-guard - 4B base model
- zen-guard-gen - 8B generative model
License
Apache 2.0
Citation
@misc{zenguardstream2025,
title={Zen Guard Stream: Real-Time Token-Level Safety Moderation},
author={Hanzo AI and Zoo Labs Foundation},
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/zenlm/zen-guard-stream}}
}
Based On
Built upon Qwen3Guard-Stream-4B.
Zen AI - Clarity Through Intelligence
zenlm.org
- Downloads last month
- 16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support