DACTYL β AI-Generated Text Detector
DACTYL is a microsoft/deberta-v3-large classifier fine-tuned for AI-generated text (AIG) detection using Empirical X-Risk Minimization (EXM). It is trained on the DACTYL-train dataset β a challenging corpus of one-shot/few-shot and continued-pre-trained (CPT) model generations β and evaluated on a held-out test set containing unseen models not present during training, demonstrating robust out-of-distribution generalization.
Why EXM?
Standard binary cross-entropy (BCE) (or empirical risk minimization, ERM) training optimizes average performance, which can mask poor behavior at the low false-positive rates that matter most in real deployments. EXM instead directly optimizes X-risk metrics (such as partial AUROC) that emphasize performance at these critical operating points. As reported in the accompanying paper, while ERM-trained classifiers marginally outperform EXM classifiers on the DACTYL test set, EXM classifiers generalize significantly better to out-of-distribution data β by as much as 50 macro-F1 points in a student essay detection scenario.
Benchmark Results
DACTYL was benchmarked against 4 other open-source AI-generated text detectors across 9 datasets. Evaluation metrics are AUROC (threshold-independent) and F1 at a default threshold of 0.5.
The DACTYL test set is particularly challenging: it contains one-shot/few-shot generations (where LLMs are prompted with human text as an example) and outputs from domain-specific CPT models β generation conditions that most existing detectors were not designed for and struggle with.
Note on external datasets: The 8 external benchmark datasets consist predominantly of zero-shot generations, which are stylistically further from human writing than the one-shot/few-shot generations DACTYL was trained on. This represents a distribution shift in the opposite direction β DACTYL is optimized for harder, more human-like AI text, and these datasets are comparatively easier. DACTYL's competitive performance on them despite this mismatch is encouraging, but scores on the external datasets should not be taken as the primary measure of its intended capability.
AUROC
| Dataset | DACTYL | Desklib | Oxidane | SuperAnnotate | Fakespot |
|---|---|---|---|---|---|
| APT-Eval | 0.797 | 0.756 | 0.760 | 0.808 | 0.727 |
| BEEMO | 0.793 | 0.798 | 0.772 | 0.728 | 0.784 |
| CoCoNUTS | 0.976 | 0.984 | 0.879 | 0.989 | 0.981 |
| DACTYL | 0.995 | 0.930 | 0.894 | 0.842 | 0.871 |
| DetectRL | 0.909 | 0.937 | 0.893 | 0.887 | 0.940 |
| Dolly-15K Cosmopedia-20K | 0.996 | 0.999 | 0.980 | 0.948 | 0.997 |
| OriginalityAI | 0.849 | 0.809 | 0.607 | 0.780 | 0.898 |
| RealDet - ENG | 0.870 | 0.986 | 0.938 | 0.924 | 0.975 |
| UChicagoBooth | 0.912 | 0.938 | 0.930 | 0.916 | 0.946 |
| Mean | 0.900 | 0.904 | 0.850 | 0.869 | 0.902 |
F1 (threshold = 0.5)
| Dataset | DACTYL | Desklib | Oxidane | SuperAnnotate | Fakespot |
|---|---|---|---|---|---|
| APT-Eval | 0.381 | 0.370 | 0.439 | 0.432 | 0.386 |
| BEEMO | 0.652 | 0.631 | 0.517 | 0.598 | 0.616 |
| CoCoNUTS | 0.807 | 0.890 | 0.483 | 0.883 | 0.871 |
| DACTYL | 0.975 | 0.786 | 0.676 | 0.711 | 0.786 |
| DetectRL | 0.821 | 0.863 | 0.789 | 0.799 | 0.842 |
| Dolly-15K Cosmopedia-20K | 0.946 | 0.661 | 0.507 | 0.670 | 0.769 |
| OriginalityAI | 0.771 | 0.713 | 0.319 | 0.677 | 0.828 |
| RealDet - ENG | 0.792 | 0.889 | 0.652 | 0.770 | 0.839 |
| UChicagoBooth | 0.806 | 0.846 | 0.722 | 0.803 | 0.856 |
| Mean | 0.772 | 0.739 | 0.567 | 0.705 | 0.755 |
DACTYL ranks 1st in mean F1 and 3rd in mean AUROC among the five evaluated detectors.
Note: The
APT-Evaldataset yields low F1 scores across all classifiers at the default threshold of 0.5. The relatively higher AUROC (~0.80) on this dataset suggests the classifier retains discriminative ability but may benefit from threshold calibration for this domain.
Training Data
DACTYL is trained on the DACTYL-train dataset, which focuses on two underexplored generation conditions:
- One-shot/few-shot generations β AI text produced when the model is given a human-written example, making it stylistically closer to human writing and harder to detect
- CPT model generations β text from domain-specific continued-pre-trained small language models, fine-tuned with memory-efficient full-parameter optimization
Training Configuration
We provide full training details for this model.
{
"training_split": "training",
"evaluation_split": "testing",
"num_epochs": 1,
"model_path": "microsoft/deberta-v3-large",
"tokenizer": "microsoft/deberta-v3-large",
"optimizer": "SOTAs",
"optimizer_type": "libauc",
"optimizer_args": {
"lr": 1e-05
},
"loss_fn": "tpAUC_KL_Loss",
"reset_classification_head": false,
"loss_type": "libauc",
"loss_fn_args": {
"data_len": 466005
},
"needs_loss_fn_as_parameter": false,
"training_args": {
"batch_size": 16,
"needs_sampler": true,
"needs_index": true,
"shuffle": false,
"sampling_rate": 0.5,
"apply_sigmoid": true
},
}
Intended Use
- Detecting AI-generated text in research and content moderation contexts
- Benchmarking against other open-source detection systems
Limitations
- Zero-shot distribution shift: Most existing AIG detection datasets use zero-shot generations, whereas DACTYL is trained on harder one-shot/few-shot and CPT-generated text. This mismatch means DACTYL-trained classifiers may slightly underperform on zero-shot-heavy benchmarks relative to detectors trained directly on those distributions β the external benchmark scores above should be interpreted with this in mind.
- EXM vs ERM tradeoff: EXM classifiers generalize better out-of-distribution but may underperform ERM classifiers on the DACTYL test set itself. If your deployment domain closely matches DACTYL-train, an ERM variant may yield higher in-distribution F1.
- CPT model degradation: As noted in the paper, continued pre-trained models are harder to detect even for DACTYL-trained classifiers β performance degrades relative to standard LLM outputs.
- Threshold sensitivity: F1 scores at the default threshold of 0.5 vary significantly across domains. Users are encouraged to calibrate the threshold for their specific use case using a small labeled sample.
- Like all detectors, DACTYL may degrade against future unseen model architectures.
Citation
@misc{thorat2025dactyl,
title = {DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models},
author = {Shantanu Thorat and Andrew Caines},
year = {2025},
eprint = {2508.00619},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2508.00619}
}
- Downloads last month
- 37
Model tree for ShantanuT01/dactyl-ai-text-detector
Base model
microsoft/deberta-v3-large