My Awesome Model π
A fine-tuned BERT model for harmful vs safe child-protection text classification.
Model Description
This model is a fine-tuned π€ Transformers model for text classification.
It was trained to classify text into predefined categories, making it useful for applications such as content moderation, sentiment analysis, or other natural language understanding tasks.
- Developed by: Erin Clarke
- Shared by: Erin Clarke
- Model type: Text Classification (Sequence Classification)
- Language(s): English
- License: Apache 2.0
- Finetuned from model: [e.g.,
bert-base-uncased,distilroberta-base]
Intended Uses
This model is intended for research and practical use in classifying text into categories. Example use cases include:
- Detecting harmful or safe content
- Sentiment classification
- Intent recognition
Not for: production deployment in safety-critical environments without thorough evaluation.
Training Data
- Dataset: Custom dataset of online chat and text messages labeled for child protection, with categories:
harmfulvssafe. - Size: 12,000 examples total
- Split: 9,600 train / 1,200 validation / 1,200 test
- Source: Collected from publicly available, de-identified text datasets and synthetic examples designed for safety research.
- Preprocessing: Text was cleaned by removing personally identifiable information (PII), lowercasing, and normalizing special characters.
Training Procedure
- Framework: π€ Transformers (Trainer API)
- Optimizer: AdamW
- Batch size: 16
- Learning rate: 5e-5
- Epochs: 3
- Hardware: 1Γ NVIDIA Tesla T4 GPU (Google Colab environment)
- Loss Function: CrossEntropyLoss (binary classification)
- Early Stopping: Not used (fixed epochs)
- Gradient Clipping: Applied at 1.0 to prevent exploding gradients
- Mixed Precision: FP16 training enabled for efficiency
Evaluation Results
The model was evaluated on the held-out test set of 1,200 examples.
Metrics are reported for the binary classification task (harmful vs safe).
| Metric | Score |
|---|---|
| Accuracy | 0.87 |
| Precision | 0.85 |
| Recall | 0.84 |
| F1 Score | 0.84 |
| ROC AUC | 0.90 |
- Precision indicates the proportion of predicted harmful messages that were truly harmful.
- Recall indicates the proportion of actual harmful messages that the model correctly identified.
- F1 Score balances precision and recall for overall effectiveness.
- ROC AUC shows strong discriminatory ability between harmful and safe text.
Limitations & Bias
- Bias in Data: The dataset may reflect biases from the sources used. Certain slang, cultural contexts, or dialects may be underrepresented, which can affect model performance for specific groups of users.
- False Negatives: The model may occasionally classify harmful text as safe, especially if the harmful content is subtle, coded, or context-dependent.
- False Positives: Harmless text may sometimes be flagged as harmful, particularly if it contains strong language used in a non-threatening context.
- Generalization: The model has only been tested on the training and test datasets provided. Performance on live, real-world child protection data may differ.
- Sensitive Use Case: This model is not a replacement for human review. It should only be used as an assistive tool and not as the sole decision-maker in child safety contexts.
- Ethical Note: Any deployment should include monitoring, continuous evaluation, and safeguards to prevent misuse.
How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("MangoScooter/my-awesome-model")
model = AutoModelForSequenceClassification.from_pretrained("MangoScooter/my-awesome-model")
inputs = tokenizer("Your text here", return_tensors="pt")
outputs = model(**inputs)
Citation
If you use this model, please cite:
@misc{my-awesome-model,
author = {Erin Clarke},
title = {My Awesome Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/MangoScooter/my-awesome-model}}
}
- Downloads last month
- 6
Evaluation results
- accuracy on Child Protection Datasetself-reported0.870
- precision on Child Protection Datasetself-reported0.850
- recall on Child Protection Datasetself-reported0.840
- f1 on Child Protection Datasetself-reported0.840
- auc on Child Protection Datasetself-reported0.900