ERNIE-4.5-0.3B Fine-tuned on Aegis AI Content Safety

Model Description

This is a fine-tuned version of Baidu's ERNIE-4.5-0.3B-Base, optimized for content safety applications. ERNIE (Enhanced Representation through kNowledge IntEgration) is particularly strong in understanding context and knowledge-enhanced language understanding, making it ideal for nuanced safety detection.

This model was fine-tuned using LoRA (Low-Rank Adaptation) on the NVIDIA Aegis AI Content Safety Dataset 2.0, which contains diverse examples of safe and unsafe content across multiple categories.

Model Details

  • Base Model: baidu/ERNIE-4.5-0.3B-Base-PT
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Dataset: NVIDIA Aegis AI Content Safety Dataset 2.0
  • Training Samples: 2,000 carefully selected samples
  • Language: English
  • License: Apache 2.0

Capabilities

  • Context-aware safety detection
  • Knowledge-enhanced content analysis
  • Cultural and linguistic sensitivity
  • Fine-grained threat classification
  • Efficient content safety screening

Intended Use Cases

  • Cross-cultural content moderation
  • Educational platform safety
  • Lightweight mobile applications
  • Edge device content filtering
  • Multi-lingual safety systems

Training Configuration

LoRA Parameters

  • Rank (r): 16
  • Alpha: 32
  • Dropout: 0.05
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Hyperparameters

  • Learning Rate: 2e-4
  • Batch Size: 4 (per device)
  • Gradient Accumulation Steps: 4
  • Effective Batch Size: 16
  • Epochs: 3
  • Optimizer: AdamW (8-bit paged)
  • LR Scheduler: Cosine with warmup
  • Warmup Ratio: 0.1
  • FP16 Training: Yes
  • Max Sequence Length: 512

Usage

Installation

pip install transformers torch peft

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "ahczhg/ernie-4.5-0.3b-aegis-safety-lora"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example: Content safety check
prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        temperature=0.7,
        do_sample=True,
        top_p=0.95
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Advanced Usage with Pipeline

from transformers import pipeline

# Create text generation pipeline
generator = pipeline(
    "text-generation",
    model="ahczhg/ernie-4.5-0.3b-aegis-safety-lora",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate safety analysis
result = generator(
    "### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n",
    max_new_tokens=128,
    temperature=0.7,
    do_sample=True
)

print(result[0]['generated_text'])

Evaluation

The model was evaluated on a held-out test set from the Aegis AI Content Safety Dataset. Key metrics include:

  • Perplexity on validation set
  • Content safety classification accuracy
  • False positive/negative rates for harmful content detection

Limitations

  • The model is primarily trained on English language content
  • Performance may vary on domain-specific or highly technical content
  • Should be used as part of a comprehensive content moderation system, not as the sole decision-maker
  • May require fine-tuning for specific use cases or content domains
  • The model's outputs should be reviewed by human moderators for critical applications

Ethical Considerations

  • This model is designed to assist in content safety and moderation tasks
  • It should not be used to censor legitimate speech or suppress diverse viewpoints
  • Decisions about content moderation should involve human oversight
  • The model may reflect biases present in the training data
  • Users should implement appropriate safeguards and appeal processes

Training Data

The model was fine-tuned on the NVIDIA Aegis AI Content Safety Dataset 2.0, which includes:

  • Diverse examples of safe and unsafe content
  • Multiple categories of potentially harmful content
  • Balanced representation of safe content
  • Real-world scenarios and edge cases

Citation

If you use this model in your research or applications, please cite:

@misc{ernie_0.3b_aegis_safety,
  author = {ahczhg},
  title = {ERNIE-4.5-0.3B Fine-tuned on Aegis AI Content Safety},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ahczhg/ernie-4.5-0.3b-aegis-safety-lora}},
  note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0}
}

Acknowledgments

  • Base model by the original authors: baidu
  • Dataset provided by NVIDIA
  • Fine-tuning performed using HuggingFace Transformers and PEFT libraries

Contact

For questions, issues, or feedback, please visit the model repository.

Model Card Authors

  • ahczhg

Model Card Contact

Support me on Ko-fi

Downloads last month
31
Safetensors
Model size
0.4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ahczhg/ernie-4.5-0.3b-aegis-safety-lora

Adapter
(1)
this model
Adapters
1 model

Dataset used to train ahczhg/ernie-4.5-0.3b-aegis-safety-lora