ERNIE-4.5-0.3B Fine-tuned on Aegis AI Content Safety

Model Description

This is a fine-tuned version of Baidu's ERNIE-4.5-0.3B-Base, optimized for content safety applications. ERNIE (Enhanced Representation through kNowledge IntEgration) is particularly strong in understanding context and knowledge-enhanced language understanding, making it ideal for nuanced safety detection.

This model was fine-tuned using LoRA (Low-Rank Adaptation) on the NVIDIA Aegis AI Content Safety Dataset 2.0, which contains diverse examples of safe and unsafe content across multiple categories.

Model Details

Base Model: baidu/ERNIE-4.5-0.3B-Base-PT
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Dataset: NVIDIA Aegis AI Content Safety Dataset 2.0
Training Samples: 2,000 carefully selected samples
Language: English
License: Apache 2.0

Capabilities

Context-aware safety detection
Knowledge-enhanced content analysis
Cultural and linguistic sensitivity
Fine-grained threat classification
Efficient content safety screening

Intended Use Cases

Cross-cultural content moderation
Educational platform safety
Lightweight mobile applications
Edge device content filtering
Multi-lingual safety systems

Training Configuration

LoRA Parameters

Rank (r): 16
Alpha: 32
Dropout: 0.05
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Hyperparameters

Learning Rate: 2e-4
Batch Size: 4 (per device)
Gradient Accumulation Steps: 4
Effective Batch Size: 16
Epochs: 3
Optimizer: AdamW (8-bit paged)
LR Scheduler: Cosine with warmup
Warmup Ratio: 0.1
FP16 Training: Yes
Max Sequence Length: 512

Usage

Installation

pip install transformers torch peft

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "ahczhg/ernie-4.5-0.3b-aegis-safety-lora"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example: Content safety check
prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        temperature=0.7,
        do_sample=True,
        top_p=0.95
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Advanced Usage with Pipeline

from transformers import pipeline

# Create text generation pipeline
generator = pipeline(
    "text-generation",
    model="ahczhg/ernie-4.5-0.3b-aegis-safety-lora",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate safety analysis
result = generator(
    "### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n",
    max_new_tokens=128,
    temperature=0.7,
    do_sample=True
)

print(result[0]['generated_text'])

Evaluation

The model was evaluated on a held-out test set from the Aegis AI Content Safety Dataset. Key metrics include:

Perplexity on validation set
Content safety classification accuracy
False positive/negative rates for harmful content detection

Limitations

The model is primarily trained on English language content
Performance may vary on domain-specific or highly technical content
Should be used as part of a comprehensive content moderation system, not as the sole decision-maker
May require fine-tuning for specific use cases or content domains
The model's outputs should be reviewed by human moderators for critical applications

Ethical Considerations

This model is designed to assist in content safety and moderation tasks
It should not be used to censor legitimate speech or suppress diverse viewpoints
Decisions about content moderation should involve human oversight
The model may reflect biases present in the training data
Users should implement appropriate safeguards and appeal processes

Training Data

The model was fine-tuned on the NVIDIA Aegis AI Content Safety Dataset 2.0, which includes:

Diverse examples of safe and unsafe content
Multiple categories of potentially harmful content
Balanced representation of safe content
Real-world scenarios and edge cases

Citation

If you use this model in your research or applications, please cite:

@misc{ernie_0.3b_aegis_safety,
  author = {ahczhg},
  title = {ERNIE-4.5-0.3B Fine-tuned on Aegis AI Content Safety},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ahczhg/ernie-4.5-0.3b-aegis-safety-lora}},
  note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0}
}

Acknowledgments

Base model by the original authors: baidu
Dataset provided by NVIDIA
Fine-tuning performed using HuggingFace Transformers and PEFT libraries

Contact

For questions, issues, or feedback, please visit the model repository.

Model Card Authors

ahczhg

Model Card Contact

https://huggingface.co/ahczhg

Downloads last month: 31

Safetensors

Model size

0.4B params

Tensor type

F16

Model tree for ahczhg/ernie-4.5-0.3b-aegis-safety-lora

Base model

baidu/ERNIE-4.5-0.3B-Base-PT

Adapter

(1)

this model

Adapters

1 model

ahczhg
/

ernie-4.5-0.3b-aegis-safety-lora