GLYPH — High-Accuracy AI Text Detector

GLYPH is a binary text classifier built on DeBERTa-v3-base that distinguishes human-written text from AI-generated text. It achieves 98.85% accuracy, 0.999 ROC-AUC, and 0.990 F1 on a held-out test set spanning 10 human writing domains and 14 AI model families — from GPT-2 (1.5B) through GPT-4 (~1T).

The model was trained on ~50K texts covering academic papers, news articles, blog posts, Reddit discussions, legal filings, Wikipedia, student essays, and technical Q&A on the human side, and outputs from 24 distinct AI model configurations across 10 model families on the AI side. It produces well-separated, high-confidence predictions (mean confidence 0.976) and remains accurate even at the strictest decision thresholds.

Key Results

Metric	Value
Accuracy	98.85%
F1 Score	0.9901
Precision	98.51%
Recall	99.52%
ROC-AUC	0.9990
Average Precision	0.9993
MCC	0.9765
Human Accuracy	97.94%
AI Accuracy	99.52%
Mean Confidence	0.976
F1 @ 0.95 threshold	0.987

All metrics evaluated on a held-out test set of 5,050 texts (2,136 human / 2,914 AI) with no overlap in source texts, split hashes, or temporal leakage with the training set.

Per-Source Performance

Human Text Sources

Source	Domain	n	Accuracy	Confidence
PubMed Abstracts	Biomedical research	300	100.0%	0.988
Blog / Opinion	Personal blogs	200	100.0%	0.987
Reddit Writing	Informal / social	300	100.0%	0.985
Wikipedia	Encyclopedic	500	99.8%	0.987
CC-News	Journalism	392	99.5%	0.981
arXiv Abstracts	Academic / scientific	444	90.8%	0.948

arXiv abstracts are the hardest category — highly formulaic academic prose with structural similarity to AI output. Even so, detection accuracy is 90.8% with 94.8% mean confidence, and the remaining errors are concentrated in a small subset of unusually short or template-heavy abstracts.

AI Model Families

Model	Family	Params	n	Accuracy	F1
GPT-3.5-Turbo	OpenAI	175B	223	100.0%	1.000
GPT-4	OpenAI	~1T	215	100.0%	1.000
Llama-2-70B-Chat	Meta	70B	191	100.0%	1.000
MPT-30B	MosaicML	30B	211	100.0%	1.000
MPT-30B-Chat	MosaicML	30B	191	100.0%	1.000
Mistral-7B-Instruct-v0.1	Mistral AI	7B	194	100.0%	1.000
Mistral-7B-v0.1	Mistral AI	7B	203	100.0%	1.000
Llama-3.1-8B-Instruct	Meta	8B	238	99.6%	0.998
Phi-3.5-Mini-Instruct	Microsoft	3.8B	238	99.6%	0.998
Command-Chat	Cohere	52B	198	99.5%	0.997
Text-Davinci-002	OpenAI	175B	176	99.4%	0.997
Llama-3.2-3B-Instruct	Meta	3B	238	99.2%	0.996
GPT-2-XL	OpenAI	1.5B	198	98.5%	0.992
Cohere Command	Cohere	52B	200	97.5%	0.987

Detection is robust across four generations of language models (GPT-2 through GPT-4), three access paradigms (open-weight, API-only, and proprietary), and parameter counts spanning three orders of magnitude (1.5B to ~1T).

Performance by Text Length

Length Bucket	n	Accuracy	F1
Very Long (>2000 words)	103	100.0%	1.000
Long (500–2000 words)	862	99.9%	0.999
Short (50–150 words)	1,976	98.5%	0.989
Medium (150–500 words)	1,634	98.8%	0.989
Very Short (<50 words)	475	98.1%	0.899

Performance degrades gracefully with shorter inputs. Even on texts under 50 words — where the model has minimal signal — accuracy remains above 98%.

Threshold Sensitivity

The model produces well-calibrated, high-confidence outputs. Performance holds across aggressive decision thresholds:

P(AI) Threshold	F1	Precision
0.50 (default)	0.990	0.985
0.60	0.991	0.987
0.70	0.992	0.990
0.80	0.992	0.992
0.90	0.991	0.993
0.95	0.987	0.996

At a 0.95 threshold, precision reaches 99.6% with only a 0.3% drop in F1 — suitable for high-stakes applications where false accusations of AI usage carry serious consequences.

Architecture

Component	Details
Base model	`microsoft/deberta-v3-base` (184M parameters)
Architecture	DeBERTa-v3 with disentangled attention and enhanced mask decoder
Task head	Linear classifier (768 → 2) with 0.15 dropout
Tokenizer	SentencePiece (slow tokenizer, `use_fast=False`)
Max sequence length	512 tokens
Output	`[P(human), P(AI)]` softmax probabilities

DeBERTa-v3 was chosen over RoBERTa and BERT alternatives due to its disentangled attention mechanism, which separately encodes content and position. This is particularly relevant for AI text detection: language models have characteristic positional dependencies in how they distribute tokens across a sequence, and disentangled attention gives the classifier direct access to these patterns.

Training

Configuration

Parameter	Value
Trainable parameters	184,423,682 (100% — all layers unfrozen)
Optimizer	AdamW (weight decay 0.01)
Learning rate	2e-5 (cosine schedule)
Warmup	10% of total steps
Effective batch size	64 (16 × 4 gradient accumulation)
Precision	bf16 mixed precision
Gradient checkpointing	Enabled (non-reentrant)
Label smoothing	0.05
Class weights	human=1.182, ai=0.867
Epochs	8 (early-stopped at 3.17)
Best checkpoint	Epoch 1.19 (by validation F1)
Training time	~49 minutes on RTX 4070 Ti 12GB
Final train loss	0.186
Final eval loss	0.150

Why Fully Unfrozen?

Initial experiments with 4 frozen encoder layers (standard practice from PAN-CLEF 2025 literature) yielded only 80% accuracy with severe human-side bias — the model classified 44% of human texts as AI. Freezing 4 of 12 layers in DeBERTa-base locks 33% of the network, far more aggressive than the 21% reported for DeBERTa-large. Unfreezing all layers with cosine LR decay and 10% warmup resolved the bias entirely, lifting human accuracy from 55.6% to 97.9% without sacrificing AI detection (97.4% → 99.5%).

Dataset Composition

Total: 50,458 texts (40,364 train / 5,044 validation / 5,050 test)

Stratified by source with hash-based deduplication to prevent data leakage.

Human Sources (10 domains, ~29K target)

Domain	Source	Target Count	Text Type
Academic (STEM)	arXiv API	5,000	Abstracts across 8 categories (cs.CL, cs.AI, cs.LG, physics, math, q-bio, econ, stat)
Academic (Medical)	PubMed API	3,000	Biomedical research abstracts
Encyclopedic	Wikipedia API	5,000	Article sections across 10 topic categories
Journalism	CC-News (HuggingFace)	4,000	News articles
Literary / Creative	Project Gutenberg	2,000	Public domain book excerpts
Informal / Social	Reddit (webis/tldr-17)	3,000	Writing-focused subreddit posts
Student / Educational	PERSUADE corpus	2,000	Student essays
Technical / Q&A	StackExchange	2,000	Technical answers
Blog / Opinion	Blog Authorship Corpus	2,000	Personal blog posts
Legal / Formal	Pile of Law	1,000	Legal opinions and case summaries

AI Sources (24 model configurations across 10 families)

Locally generated via LM Studio (8 models, Q4_K_M quantization):

Model	Family	Parameters
Llama-3.1-8B-Instruct	Meta Llama	8B
Llama-3.2-3B-Instruct	Meta Llama	3B
Mistral-7B-Instruct-v0.3	Mistral AI	7B
Qwen2.5-7B-Instruct	Alibaba Qwen	7B
Qwen2.5-14B-Instruct	Alibaba Qwen	14B
Gemma-2-9B-Instruct	Google	9B
Phi-3.5-Mini-Instruct	Microsoft	3.8B
DeepSeek-V2-Lite-Chat	DeepSeek	16B (MoE)

Local generation used 4 temperature/sampling configurations (default, creative, precise, varied) across 6 prompt strategies (direct, continue, rewrite, expand, style_mimic, question_answer) with a system prompt enforcing natural human-like output — no markdown, no meta-commentary, no self-referential AI language.

HuggingFace datasets (16 additional model families):

Dataset	Models Added	Reference
RAID (ACL 2024)	ChatGPT-3.5, GPT-4, GPT-3-Davinci, Cohere Command, Llama-2-70B-Chat, Mistral-7B-v0.1, Mixtral-8x7B, MPT-30B, GPT-2-XL	liamdugan/raid
AI Text Detection Pile	GPT-2/3/J/ChatGPT (mixed)	artem9k/ai-text-detection-pile
NYT Multi-Model	GPT-4o, Yi-Large, Qwen-2-72B, Llama-3-8B, Gemma-2-9B, Mistral-7B	gsingh1-py/train

This combination ensures coverage of proprietary API models (GPT-3.5, GPT-4, GPT-4o, Cohere), large open models exceeding consumer GPU VRAM (Llama-2-70B, Qwen-2-72B, Mixtral-8x7B, Yi-Large), older architectures (GPT-2, GPT-3, GPT-J), and mixture-of-experts models (Mixtral, DeepSeek-V2-Lite). RAID data was filtered to non-adversarial generations only (attack=="none") for training data quality.

Usage

With Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "ogmatrixllm/glyph"  # Replace with your repo path
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

text = "Your text to classify here..."

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    
p_human, p_ai = probs[0].tolist()
label = "AI-generated" if p_ai > 0.5 else "Human-written"
confidence = max(p_human, p_ai)

print(f"{label} (confidence: {confidence:.1%})")

With Pipeline

from transformers import pipeline

detector = pipeline(
    "text-classification",
    model="ogmatrixai/glyph",  # Replace with your repo path
    tokenizer=AutoTokenizer.from_pretrained("ogmatrixai/glyph", use_fast=False),
)

result = detector("Your text here...")
print(result)
# [{'label': 'LABEL_1', 'score': 0.98}]  # LABEL_0 = human, LABEL_1 = AI

Important Notes

Tokenizer: Always use use_fast=False. The fast tokenizer for DeBERTa-v3 has a confirmed regression in transformers>=4.47 (#42583) that crashes on load.
Max length: The model was trained with max_length=512. Longer texts should be truncated or chunked with predictions aggregated.
Labels: LABEL_0 = human, LABEL_1 = AI-generated.

Limitations and Ethical Considerations

Known Limitations

English only. GLYPH was trained exclusively on English text. Performance on other languages is untested and likely degraded.
Training distribution. The model has seen outputs from 24 specific AI model configurations. Novel architectures, heavily fine-tuned models, or future model families may evade detection. AI text detection is fundamentally adversarial — no static detector provides permanent robustness.
arXiv abstracts remain the hardest domain at 90.8% accuracy. Highly formulaic academic writing with rigid structural conventions shares surface features with AI-generated text. Users in academic integrity contexts should treat borderline predictions on scientific abstracts with appropriate caution.
Short texts (<50 words) have reduced F1 (0.899) despite high accuracy (98.1%). With minimal token-level signal, the model occasionally produces confident but incorrect predictions. For short-form content, consider requiring higher confidence thresholds.
Adversarial attacks. The training data includes only non-adversarial AI outputs. Paraphrasing attacks, homoglyph substitution, targeted prompt engineering, and watermark-removal techniques were not included. Dedicated adversarial robustness (e.g., RAID adversarial subsets) is a planned enhancement.
Mixed authorship. GLYPH classifies at the document level. It does not detect partial AI usage (e.g., AI-written paragraphs embedded in a human-written essay). Sentence-level or span-level detection requires a different approach.
512-token window. Texts are truncated at 512 tokens. For long documents, this means classification is based on the opening ~350–400 words only. Sliding-window aggregation is recommended for long-form content.

Ethical Considerations

AI text detection carries real consequences — academic penalties, professional reputation damage, content moderation decisions. False positives (human text classified as AI) are particularly harmful. While GLYPH's false positive rate is low (2.06% on the test set, 44 out of 2,136 human texts), no detector achieves zero false positives.

Recommendations for responsible deployment:

Never use GLYPH as the sole basis for punitive action. Use it as one signal among many (metadata, behavioral patterns, stylometric analysis).
Apply a high confidence threshold (≥0.95) for consequential decisions. At this threshold, precision reaches 99.6%.
Provide users with the confidence score, not just a binary label. A text scored at P(AI)=0.52 is fundamentally different from one scored at P(AI)=0.99.
Maintain an appeals process. Statistical classifiers will always produce errors.
Acknowledge the base rate problem. In populations where AI usage is rare, even a 2% FPR produces many false accusations relative to true detections.

Training Infrastructure

Component	Specification
GPU	NVIDIA GeForce RTX 4070 Ti (12GB VRAM)
CPU	Intel Core i7-14700K (20 cores)
RAM	48GB DDR5
Framework	PyTorch 2.6+ / HuggingFace Transformers
Precision	bf16 mixed precision
Total training time	49 minutes
Experiment tracking	Weights & Biases

Citation

@misc{glyph2026,
  title={GLYPH: High-Accuracy AI Text Detection with DeBERTa-v3},
  author={OGMatrix},
  year={2026},
  url={https://huggingface.co/ogmatrixllm/glyph}
}

Acknowledgments

Training data incorporates the RAID benchmark (Dugan et al., ACL 2024), the AI Text Detection Pile, and the NYT Multi-Model dataset. Human text sources include arXiv, PubMed, Wikipedia, CC-News, Project Gutenberg, Reddit, StackExchange, Blog Authorship Corpus, PERSUADE, and Pile of Law. The base model is DeBERTa-v3-base by Microsoft Research.

Downloads last month: 26

Safetensors

Model size

0.2B params

Tensor type

F32

Datasets used to train ogmatrixllm/glyph-v1.1

Evaluation results

Accuracy
self-reported

0.989
F1
self-reported

0.990
Precision
self-reported

0.985
Recall
self-reported

0.995
ROC-AUC
self-reported

0.999
MCC
self-reported

0.977