drsis's picture
Upload folder using huggingface_hub
8d28110 verified
metadata
license: apache-2.0
language: en
tags:
  - xgboost
  - ai-detection
  - academic-integrity
  - text-classification
pipeline_tag: text-classification

DivEye XGBoost AI Detector

XGBoost classifier for AI text detection based on the DivEye method. Uses statistical features extracted from LLM embeddings to classify text as human or AI-generated.

Model Description

This model implements the DivEye detection approach, which analyzes distributional properties of text using features derived from a base language model. The XGBoost classifier is trained on these statistical features to distinguish between human and AI-generated academic text.

  • Model type: XGBoost Classifier
  • Language: English
  • License: Apache 2.0
  • Feature extractor: GPT-OSS-20B embeddings

Intended Use

This model is intended for:

  • Detecting AI-generated content in academic submissions
  • Research on statistical AI text detection methods
  • Ensemble combination with neural detectors

Important: This model should be used as one component in a larger detection ensemble. It provides complementary signal to neural classifiers.

Performance

When used as part of the full detection ensemble:

  • Provides statistical features that complement neural detectors
  • Helps reduce false positives on edge cases
  • Particularly effective on longer texts

Usage

import pickle
import numpy as np

# Load the model
with open("diveye_xgboost.pkl", "rb") as f:
    model = pickle.load(f)

# Features should be extracted using the DivEye feature extractor
# See the full detection pipeline for feature extraction code
features = extract_diveye_features(text)  # Returns numpy array

# Predict
probability = model.predict_proba(features.reshape(1, -1))[0][1]
print(f"AI Probability: {probability:.2%}")

Features

The model expects statistical features including:

  • Perplexity-based metrics
  • Token probability distributions
  • Entropy measures
  • Distributional statistics

Limitations

  • Requires a compatible feature extractor (GPT-OSS-20B based)
  • Best used in combination with neural detectors
  • May have reduced accuracy on very short texts
  • Optimized for academic/formal writing style

Citation

@misc{diveye_xgboost_detector,
  author = {COAI},
  title = {DivEye XGBoost AI Text Detector},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/coai/diveye-xgboost-detector}
}

Contact

For questions or issues, please open an issue on the model repository.