safety - a danielbis Collection

danielbis 's Collections

safety

agents

cpt

safety

updated about 1 month ago

agentlans/prompt-safety-classification

Viewer • Updated Aug 24 • 72.1k • 32
Jammies-io/safety-refusal

Viewer • Updated Aug 25 • 100 • 11
RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models

Paper • 2510.10390 • Published Oct 12 • 2
nvidia/Aegis-AI-Content-Safety-Dataset-2.0

Viewer • Updated Jun 9 • 33.4k • 4.49k • 66