DACTYL logo

DACTYL β€” AI-Generated Text Detector

arXiv GitHub

DACTYL is a microsoft/deberta-v3-large classifier fine-tuned for AI-generated text (AIG) detection using Empirical X-Risk Minimization (EXM). It is trained on the DACTYL-train dataset β€” a challenging corpus of one-shot/few-shot and continued-pre-trained (CPT) model generations β€” and evaluated on a held-out test set containing unseen models not present during training, demonstrating robust out-of-distribution generalization.

Why EXM?

Standard binary cross-entropy (BCE) (or empirical risk minimization, ERM) training optimizes average performance, which can mask poor behavior at the low false-positive rates that matter most in real deployments. EXM instead directly optimizes X-risk metrics (such as partial AUROC) that emphasize performance at these critical operating points. As reported in the accompanying paper, while ERM-trained classifiers marginally outperform EXM classifiers on the DACTYL test set, EXM classifiers generalize significantly better to out-of-distribution data β€” by as much as 50 macro-F1 points in a student essay detection scenario.

Benchmark Results

DACTYL was benchmarked against 4 other open-source AI-generated text detectors across 9 datasets. Evaluation metrics are AUROC (threshold-independent) and F1 at a default threshold of 0.5.

The DACTYL test set is particularly challenging: it contains one-shot/few-shot generations (where LLMs are prompted with human text as an example) and outputs from domain-specific CPT models β€” generation conditions that most existing detectors were not designed for and struggle with.

Note on external datasets: The 8 external benchmark datasets consist predominantly of zero-shot generations, which are stylistically further from human writing than the one-shot/few-shot generations DACTYL was trained on. This represents a distribution shift in the opposite direction β€” DACTYL is optimized for harder, more human-like AI text, and these datasets are comparatively easier. DACTYL's competitive performance on them despite this mismatch is encouraging, but scores on the external datasets should not be taken as the primary measure of its intended capability.

0.5 0.6 0.7 0.8 0.9 1.0 0.904 0.739 Desklib 0.902 0.755 Fakespot 0.900 0.772 DACTYL 0.869 0.705 SuperAnnotate 0.850 0.567 Oxidane AUROC F1 Mean AUROC & F1 Score by Classifier

AUROC

Dataset DACTYL Desklib Oxidane SuperAnnotate Fakespot
APT-Eval 0.797 0.756 0.760 0.808 0.727
BEEMO 0.793 0.798 0.772 0.728 0.784
CoCoNUTS 0.976 0.984 0.879 0.989 0.981
DACTYL 0.995 0.930 0.894 0.842 0.871
DetectRL 0.909 0.937 0.893 0.887 0.940
Dolly-15K Cosmopedia-20K 0.996 0.999 0.980 0.948 0.997
OriginalityAI 0.849 0.809 0.607 0.780 0.898
RealDet - ENG 0.870 0.986 0.938 0.924 0.975
UChicagoBooth 0.912 0.938 0.930 0.916 0.946
Mean 0.900 0.904 0.850 0.869 0.902

F1 (threshold = 0.5)

Dataset DACTYL Desklib Oxidane SuperAnnotate Fakespot
APT-Eval 0.381 0.370 0.439 0.432 0.386
BEEMO 0.652 0.631 0.517 0.598 0.616
CoCoNUTS 0.807 0.890 0.483 0.883 0.871
DACTYL 0.975 0.786 0.676 0.711 0.786
DetectRL 0.821 0.863 0.789 0.799 0.842
Dolly-15K Cosmopedia-20K 0.946 0.661 0.507 0.670 0.769
OriginalityAI 0.771 0.713 0.319 0.677 0.828
RealDet - ENG 0.792 0.889 0.652 0.770 0.839
UChicagoBooth 0.806 0.846 0.722 0.803 0.856
Mean 0.772 0.739 0.567 0.705 0.755

DACTYL ranks 1st in mean F1 and 3rd in mean AUROC among the five evaluated detectors.

Note: The APT-Eval dataset yields low F1 scores across all classifiers at the default threshold of 0.5. The relatively higher AUROC (~0.80) on this dataset suggests the classifier retains discriminative ability but may benefit from threshold calibration for this domain.

Training Data

DACTYL is trained on the DACTYL-train dataset, which focuses on two underexplored generation conditions:

  • One-shot/few-shot generations β€” AI text produced when the model is given a human-written example, making it stylistically closer to human writing and harder to detect
  • CPT model generations β€” text from domain-specific continued-pre-trained small language models, fine-tuned with memory-efficient full-parameter optimization

Training Configuration

We provide full training details for this model.

{
    "training_split": "training",
    "evaluation_split": "testing",
    "num_epochs": 1,
    "model_path": "microsoft/deberta-v3-large",
    "tokenizer": "microsoft/deberta-v3-large",
    "optimizer": "SOTAs",
    "optimizer_type": "libauc",
    "optimizer_args": {
        "lr": 1e-05
    },
    "loss_fn": "tpAUC_KL_Loss",
    "reset_classification_head": false,
    "loss_type": "libauc",
    "loss_fn_args": {
        "data_len": 466005
    },
    "needs_loss_fn_as_parameter": false,
    "training_args": {
        "batch_size": 16,
        "needs_sampler": true,
        "needs_index": true,
        "shuffle": false,
        "sampling_rate": 0.5,
        "apply_sigmoid": true
    },
}

Intended Use

  • Detecting AI-generated text in research and content moderation contexts
  • Benchmarking against other open-source detection systems

Limitations

  • Zero-shot distribution shift: Most existing AIG detection datasets use zero-shot generations, whereas DACTYL is trained on harder one-shot/few-shot and CPT-generated text. This mismatch means DACTYL-trained classifiers may slightly underperform on zero-shot-heavy benchmarks relative to detectors trained directly on those distributions β€” the external benchmark scores above should be interpreted with this in mind.
  • EXM vs ERM tradeoff: EXM classifiers generalize better out-of-distribution but may underperform ERM classifiers on the DACTYL test set itself. If your deployment domain closely matches DACTYL-train, an ERM variant may yield higher in-distribution F1.
  • CPT model degradation: As noted in the paper, continued pre-trained models are harder to detect even for DACTYL-trained classifiers β€” performance degrades relative to standard LLM outputs.
  • Threshold sensitivity: F1 scores at the default threshold of 0.5 vary significantly across domains. Users are encouraged to calibrate the threshold for their specific use case using a small labeled sample.
  • Like all detectors, DACTYL may degrade against future unseen model architectures.

Citation

@misc{thorat2025dactyl,
  title         = {DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models},
  author        = {Shantanu Thorat and Andrew Caines},
  year          = {2025},
  eprint        = {2508.00619},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2508.00619}
}
Downloads last month
37
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ShantanuT01/dactyl-ai-text-detector

Finetuned
(242)
this model

Dataset used to train ShantanuT01/dactyl-ai-text-detector

Collection including ShantanuT01/dactyl-ai-text-detector

Paper for ShantanuT01/dactyl-ai-text-detector