GLYPH — High-Accuracy AI Text Detector

GLYPH is a binary text classifier built on DeBERTa-v3-base that distinguishes human-written text from AI-generated text. It achieves 98.85% accuracy, 0.999 ROC-AUC, and 0.990 F1 on a held-out test set spanning 10 human writing domains and 14 AI model families — from GPT-2 (1.5B) through GPT-4 (~1T).

The model was trained on ~50K texts covering academic papers, news articles, blog posts, Reddit discussions, legal filings, Wikipedia, student essays, and technical Q&A on the human side, and outputs from 24 distinct AI model configurations across 10 model families on the AI side. It produces well-separated, high-confidence predictions (mean confidence 0.976) and remains accurate even at the strictest decision thresholds.

Key Results

Metric Value
Accuracy 98.85%
F1 Score 0.9901
Precision 98.51%
Recall 99.52%
ROC-AUC 0.9990
Average Precision 0.9993
MCC 0.9765
Human Accuracy 97.94%
AI Accuracy 99.52%
Mean Confidence 0.976
F1 @ 0.95 threshold 0.987

All metrics evaluated on a held-out test set of 5,050 texts (2,136 human / 2,914 AI) with no overlap in source texts, split hashes, or temporal leakage with the training set.

Per-Source Performance

Human Text Sources

Source Domain n Accuracy Confidence
PubMed Abstracts Biomedical research 300 100.0% 0.988
Blog / Opinion Personal blogs 200 100.0% 0.987
Reddit Writing Informal / social 300 100.0% 0.985
Wikipedia Encyclopedic 500 99.8% 0.987
CC-News Journalism 392 99.5% 0.981
arXiv Abstracts Academic / scientific 444 90.8% 0.948

arXiv abstracts are the hardest category — highly formulaic academic prose with structural similarity to AI output. Even so, detection accuracy is 90.8% with 94.8% mean confidence, and the remaining errors are concentrated in a small subset of unusually short or template-heavy abstracts.

AI Model Families

Model Family Params n Accuracy F1
GPT-3.5-Turbo OpenAI 175B 223 100.0% 1.000
GPT-4 OpenAI ~1T 215 100.0% 1.000
Llama-2-70B-Chat Meta 70B 191 100.0% 1.000
MPT-30B MosaicML 30B 211 100.0% 1.000
MPT-30B-Chat MosaicML 30B 191 100.0% 1.000
Mistral-7B-Instruct-v0.1 Mistral AI 7B 194 100.0% 1.000
Mistral-7B-v0.1 Mistral AI 7B 203 100.0% 1.000
Llama-3.1-8B-Instruct Meta 8B 238 99.6% 0.998
Phi-3.5-Mini-Instruct Microsoft 3.8B 238 99.6% 0.998
Command-Chat Cohere 52B 198 99.5% 0.997
Text-Davinci-002 OpenAI 175B 176 99.4% 0.997
Llama-3.2-3B-Instruct Meta 3B 238 99.2% 0.996
GPT-2-XL OpenAI 1.5B 198 98.5% 0.992
Cohere Command Cohere 52B 200 97.5% 0.987

Detection is robust across four generations of language models (GPT-2 through GPT-4), three access paradigms (open-weight, API-only, and proprietary), and parameter counts spanning three orders of magnitude (1.5B to ~1T).

Performance by Text Length

Length Bucket n Accuracy F1
Very Long (>2000 words) 103 100.0% 1.000
Long (500–2000 words) 862 99.9% 0.999
Short (50–150 words) 1,976 98.5% 0.989
Medium (150–500 words) 1,634 98.8% 0.989
Very Short (<50 words) 475 98.1% 0.899

Performance degrades gracefully with shorter inputs. Even on texts under 50 words — where the model has minimal signal — accuracy remains above 98%.

Threshold Sensitivity

The model produces well-calibrated, high-confidence outputs. Performance holds across aggressive decision thresholds:

P(AI) Threshold F1 Precision
0.50 (default) 0.990 0.985
0.60 0.991 0.987
0.70 0.992 0.990
0.80 0.992 0.992
0.90 0.991 0.993
0.95 0.987 0.996

At a 0.95 threshold, precision reaches 99.6% with only a 0.3% drop in F1 — suitable for high-stakes applications where false accusations of AI usage carry serious consequences.

Architecture

Component Details
Base model microsoft/deberta-v3-base (184M parameters)
Architecture DeBERTa-v3 with disentangled attention and enhanced mask decoder
Task head Linear classifier (768 → 2) with 0.15 dropout
Tokenizer SentencePiece (slow tokenizer, use_fast=False)
Max sequence length 512 tokens
Output [P(human), P(AI)] softmax probabilities

DeBERTa-v3 was chosen over RoBERTa and BERT alternatives due to its disentangled attention mechanism, which separately encodes content and position. This is particularly relevant for AI text detection: language models have characteristic positional dependencies in how they distribute tokens across a sequence, and disentangled attention gives the classifier direct access to these patterns.

Training

Configuration

Parameter Value
Trainable parameters 184,423,682 (100% — all layers unfrozen)
Optimizer AdamW (weight decay 0.01)
Learning rate 2e-5 (cosine schedule)
Warmup 10% of total steps
Effective batch size 64 (16 × 4 gradient accumulation)
Precision bf16 mixed precision
Gradient checkpointing Enabled (non-reentrant)
Label smoothing 0.05
Class weights human=1.182, ai=0.867
Epochs 8 (early-stopped at 3.17)
Best checkpoint Epoch 1.19 (by validation F1)
Training time ~49 minutes on RTX 4070 Ti 12GB
Final train loss 0.186
Final eval loss 0.150

Why Fully Unfrozen?

Initial experiments with 4 frozen encoder layers (standard practice from PAN-CLEF 2025 literature) yielded only 80% accuracy with severe human-side bias — the model classified 44% of human texts as AI. Freezing 4 of 12 layers in DeBERTa-base locks 33% of the network, far more aggressive than the 21% reported for DeBERTa-large. Unfreezing all layers with cosine LR decay and 10% warmup resolved the bias entirely, lifting human accuracy from 55.6% to 97.9% without sacrificing AI detection (97.4% → 99.5%).

Dataset Composition

Total: 50,458 texts (40,364 train / 5,044 validation / 5,050 test)

Stratified by source with hash-based deduplication to prevent data leakage.

Human Sources (10 domains, ~29K target)

Domain Source Target Count Text Type
Academic (STEM) arXiv API 5,000 Abstracts across 8 categories (cs.CL, cs.AI, cs.LG, physics, math, q-bio, econ, stat)
Academic (Medical) PubMed API 3,000 Biomedical research abstracts
Encyclopedic Wikipedia API 5,000 Article sections across 10 topic categories
Journalism CC-News (HuggingFace) 4,000 News articles
Literary / Creative Project Gutenberg 2,000 Public domain book excerpts
Informal / Social Reddit (webis/tldr-17) 3,000 Writing-focused subreddit posts
Student / Educational PERSUADE corpus 2,000 Student essays
Technical / Q&A StackExchange 2,000 Technical answers
Blog / Opinion Blog Authorship Corpus 2,000 Personal blog posts
Legal / Formal Pile of Law 1,000 Legal opinions and case summaries

AI Sources (24 model configurations across 10 families)

Locally generated via LM Studio (8 models, Q4_K_M quantization):

Model Family Parameters
Llama-3.1-8B-Instruct Meta Llama 8B
Llama-3.2-3B-Instruct Meta Llama 3B
Mistral-7B-Instruct-v0.3 Mistral AI 7B
Qwen2.5-7B-Instruct Alibaba Qwen 7B
Qwen2.5-14B-Instruct Alibaba Qwen 14B
Gemma-2-9B-Instruct Google 9B
Phi-3.5-Mini-Instruct Microsoft 3.8B
DeepSeek-V2-Lite-Chat DeepSeek 16B (MoE)

Local generation used 4 temperature/sampling configurations (default, creative, precise, varied) across 6 prompt strategies (direct, continue, rewrite, expand, style_mimic, question_answer) with a system prompt enforcing natural human-like output — no markdown, no meta-commentary, no self-referential AI language.

HuggingFace datasets (16 additional model families):

Dataset Models Added Reference
RAID (ACL 2024) ChatGPT-3.5, GPT-4, GPT-3-Davinci, Cohere Command, Llama-2-70B-Chat, Mistral-7B-v0.1, Mixtral-8x7B, MPT-30B, GPT-2-XL liamdugan/raid
AI Text Detection Pile GPT-2/3/J/ChatGPT (mixed) artem9k/ai-text-detection-pile
NYT Multi-Model GPT-4o, Yi-Large, Qwen-2-72B, Llama-3-8B, Gemma-2-9B, Mistral-7B gsingh1-py/train

This combination ensures coverage of proprietary API models (GPT-3.5, GPT-4, GPT-4o, Cohere), large open models exceeding consumer GPU VRAM (Llama-2-70B, Qwen-2-72B, Mixtral-8x7B, Yi-Large), older architectures (GPT-2, GPT-3, GPT-J), and mixture-of-experts models (Mixtral, DeepSeek-V2-Lite). RAID data was filtered to non-adversarial generations only (attack=="none") for training data quality.

Usage

With Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "ogmatrixllm/glyph"  # Replace with your repo path
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

text = "Your text to classify here..."

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    
p_human, p_ai = probs[0].tolist()
label = "AI-generated" if p_ai > 0.5 else "Human-written"
confidence = max(p_human, p_ai)

print(f"{label} (confidence: {confidence:.1%})")

With Pipeline

from transformers import pipeline

detector = pipeline(
    "text-classification",
    model="ogmatrixai/glyph",  # Replace with your repo path
    tokenizer=AutoTokenizer.from_pretrained("ogmatrixai/glyph", use_fast=False),
)

result = detector("Your text here...")
print(result)
# [{'label': 'LABEL_1', 'score': 0.98}]  # LABEL_0 = human, LABEL_1 = AI

Important Notes

  • Tokenizer: Always use use_fast=False. The fast tokenizer for DeBERTa-v3 has a confirmed regression in transformers>=4.47 (#42583) that crashes on load.
  • Max length: The model was trained with max_length=512. Longer texts should be truncated or chunked with predictions aggregated.
  • Labels: LABEL_0 = human, LABEL_1 = AI-generated.

Limitations and Ethical Considerations

Known Limitations

  1. English only. GLYPH was trained exclusively on English text. Performance on other languages is untested and likely degraded.

  2. Training distribution. The model has seen outputs from 24 specific AI model configurations. Novel architectures, heavily fine-tuned models, or future model families may evade detection. AI text detection is fundamentally adversarial — no static detector provides permanent robustness.

  3. arXiv abstracts remain the hardest domain at 90.8% accuracy. Highly formulaic academic writing with rigid structural conventions shares surface features with AI-generated text. Users in academic integrity contexts should treat borderline predictions on scientific abstracts with appropriate caution.

  4. Short texts (<50 words) have reduced F1 (0.899) despite high accuracy (98.1%). With minimal token-level signal, the model occasionally produces confident but incorrect predictions. For short-form content, consider requiring higher confidence thresholds.

  5. Adversarial attacks. The training data includes only non-adversarial AI outputs. Paraphrasing attacks, homoglyph substitution, targeted prompt engineering, and watermark-removal techniques were not included. Dedicated adversarial robustness (e.g., RAID adversarial subsets) is a planned enhancement.

  6. Mixed authorship. GLYPH classifies at the document level. It does not detect partial AI usage (e.g., AI-written paragraphs embedded in a human-written essay). Sentence-level or span-level detection requires a different approach.

  7. 512-token window. Texts are truncated at 512 tokens. For long documents, this means classification is based on the opening ~350–400 words only. Sliding-window aggregation is recommended for long-form content.

Ethical Considerations

AI text detection carries real consequences — academic penalties, professional reputation damage, content moderation decisions. False positives (human text classified as AI) are particularly harmful. While GLYPH's false positive rate is low (2.06% on the test set, 44 out of 2,136 human texts), no detector achieves zero false positives.

Recommendations for responsible deployment:

  • Never use GLYPH as the sole basis for punitive action. Use it as one signal among many (metadata, behavioral patterns, stylometric analysis).
  • Apply a high confidence threshold (≥0.95) for consequential decisions. At this threshold, precision reaches 99.6%.
  • Provide users with the confidence score, not just a binary label. A text scored at P(AI)=0.52 is fundamentally different from one scored at P(AI)=0.99.
  • Maintain an appeals process. Statistical classifiers will always produce errors.
  • Acknowledge the base rate problem. In populations where AI usage is rare, even a 2% FPR produces many false accusations relative to true detections.

Training Infrastructure

Component Specification
GPU NVIDIA GeForce RTX 4070 Ti (12GB VRAM)
CPU Intel Core i7-14700K (20 cores)
RAM 48GB DDR5
Framework PyTorch 2.6+ / HuggingFace Transformers
Precision bf16 mixed precision
Total training time 49 minutes
Experiment tracking Weights & Biases

Citation

@misc{glyph2026,
  title={GLYPH: High-Accuracy AI Text Detection with DeBERTa-v3},
  author={OGMatrix},
  year={2026},
  url={https://huggingface.co/ogmatrixllm/glyph}
}

Acknowledgments

Training data incorporates the RAID benchmark (Dugan et al., ACL 2024), the AI Text Detection Pile, and the NYT Multi-Model dataset. Human text sources include arXiv, PubMed, Wikipedia, CC-News, Project Gutenberg, Reddit, StackExchange, Blog Authorship Corpus, PERSUADE, and Pile of Law. The base model is DeBERTa-v3-base by Microsoft Research.

Downloads last month
26
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train ogmatrixllm/glyph-v1.1

Evaluation results