prompt-harmfulness-multilabel (moderation)
Collection
Tiny guardrails for 'prompt-harmfulness-multilabel' trained on https://huggingface.co/datasets/enguard/multi-lingual-prompt-moderation.
•
5 items
•
Updated
This model is a fine-tuned Model2Vec classifier based on minishlab/potion-base-8m for the prompt-harmfulness-multilabel found in the enguard/multi-lingual-prompt-moderation dataset.
pip install model2vec[inference]
from model2vec.inference import StaticModelPipeline
model = StaticModelPipeline.from_pretrained(
"enguard/tiny-guard-8m-en-prompt-harmfulness-multilabel-moderation"
)
# Supports single texts. Format input as a single text:
text = "Example sentence"
model.predict([text])
model.predict_proba([text])
Below is a quick overview of the model variant and core metrics.
| Field | Value |
|---|---|
| Classifies | prompt-harmfulness-multilabel |
| Base Model | minishlab/potion-base-8m |
| Precision | 0.7902 |
| Recall | 0.5926 |
| F1 | 0.6773 |
{
"0": {
"precision": 0.879585326953748,
"recall": 0.5573521980798383,
"f1-score": 0.6823383854005568,
"support": 1979.0
},
"1": {
"precision": 0.5487804878048781,
"recall": 0.5421686746987951,
"f1-score": 0.5454545454545454,
"support": 249.0
},
"2": {
"precision": 0.35555555555555557,
"recall": 0.45714285714285713,
"f1-score": 0.4,
"support": 35.0
},
"3": {
"precision": 0.8397565922920892,
"recall": 0.7125645438898451,
"f1-score": 0.770949720670391,
"support": 581.0
},
"4": {
"precision": 0.5993377483443708,
"recall": 0.6557971014492754,
"f1-score": 0.6262975778546713,
"support": 276.0
},
"micro avg": {
"precision": 0.7901709401709401,
"recall": 0.5926282051282051,
"f1-score": 0.6772893772893773,
"support": 3120.0
},
"macro avg": {
"precision": 0.6446031421901284,
"recall": 0.5850050750521222,
"f1-score": 0.605008045876033,
"support": 3120.0
},
"weighted avg": {
"precision": 0.815098059812204,
"recall": 0.5926282051282051,
"f1-score": 0.6797903095266892,
"support": 3120.0
},
"samples avg": {
"precision": 0.3008216395864106,
"recall": 0.29749815361890697,
"f1-score": 0.29453119504818176,
"support": 3120.0
}
}
| Text | True Label | Predicted Label |
|---|---|---|
| If you can fold your penis into the shape of a vulva, you deserve not to be caught. In fact you deserve an ace origami prize. | harassment | harassment |
| I'm surprised they let you stay without buying anything. | ||
| Pirates flew the Jolly Roger.\n\nPimps blew the roger jolly. | ||
| Seriously? You're "depressed" because of a story? Oh, Reddit. |
||
| Yeah what kind of community manager publicly mocks their players achievements? With no clarification, straight douche. | harassment | harassment |
| If you can fold your penis into the shape of a vulva, you deserve not to be caught. In fact you deserve an ace origami prize. | harassment | harassment |
| Dataset Size | Time (seconds) | Predictions/Second |
|---|---|---|
| 1 | 0.0011 | 921.62 |
| 1000 | 0.09 | 11108.03 |
| 5452 | 0.4182 | 13037.29 |
Below is a general overview of the best-performing models for each dataset variant.
If you use this model, please cite Model2Vec:
@software{minishlab2024model2vec,
author = {Stephan Tulkens and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.17270888},
url = {https://github.com/MinishLab/model2vec},
license = {MIT}
}