π― Fake Image Detection Ensemble (9 Models)
A powerful ensemble of 9 specialized models trained for detecting fake/AI-generated images using single-class anomaly detection. Trained only on real images to learn what "normal" looks like, then detects fakes as anomalies.
π Performance
| Metric | Score |
|---|---|
| Accuracy | 67.05% |
| Precision | 87.97% |
| Recall | 39.50% |
| F1 Score | 54.52% |
Confusion Matrix
- True Negatives: 946 (real correctly identified)
- False Positives: 54 (real misclassified as fake)
- False Negatives: 605 (fake misclassified as real)
- True Positives: 395 (fake correctly identified)
ποΈ Architecture
The ensemble combines 9 specialized models using different detection strategies:
Deep Learning Models (3):
Enhanced Frequency VAE - Multi-scale frequency analysis with phase information
- Uses both magnitude and phase of FFT
- Spectral consistency loss
- Detects frequency-domain artifacts
Edge Normalizing Flow - Probability density estimation on edge features
- Multi-scale edge analysis
- Normalizing flow architecture
- Detects unnatural edge patterns
Semantic Deep SVDD - ResNet50-based hypersphere anomaly detection
- Semantic feature extraction
- One-class deep learning
- Detects high-level semantic anomalies
Traditional ML Models (6):
Texture One-Class SVM - Boundary-based detection
- Enhanced texture features
- RBF kernel
- Tight decision boundary (nu=0.03)
Isolation Forest - Isolation-based anomaly detection
- 200 estimators
- Frequency + spatial features
- Fast inference
Local Outlier Factor - Local density anomalies
- Multi-scale patch analysis
- Novelty detection mode
- 20 neighbors
Gaussian Mixture Model - Distribution modeling
- 10 components
- Full covariance
- Color distribution analysis
Color Distribution Model - Statistical color analysis
- RGB histograms
- Mahalanobis distance
- Color moment analysis
Statistical Model - Edge and color statistics
- Sobel edge detection
- Multi-scale analysis
- Mahalanobis distance
π Training Details
- Training Data: 30,000 real images from COCO dataset
- Training Approach: Single-class anomaly detection (NO fake images used)
- Validation Split: 20% (6,000 images)
- Test Set: 1,000 real + 1,000 fake images (completely separate)
- Training Time: ~5-6 hours on GPU
- Ensemble Method: Weighted voting with adaptive threshold
Model Training Times (Extended):
- Enhanced Frequency VAE: 45 minutes
- Texture One-Class SVM: 45 minutes
- Color Distribution Model: 30 minutes
- Edge Normalizing Flow: 45 minutes
- Semantic Deep SVDD: 45 minutes
- Statistical Model: 30 minutes
- Isolation Forest: 30 minutes
- Local Outlier Factor: 35 minutes
- Gaussian Mixture Model: 30 minutes
π Quick Start
import torch
from torchvision import transforms
from PIL import Image
import pickle
import json
from huggingface_hub import hf_hub_download
# Configuration
repo_id = "ash12321/fake-image-detection-ensemble"
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# Download and load config
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
with open(config_path, 'r') as f:
config = json.load(f)
# Load models (you need the model class definitions)
# Example for one model:
vae_path = hf_hub_download(repo_id=repo_id, filename="freq_vae.pth")
# freq_vae = EnhancedFreqVAE()
# freq_vae.load_state_dict(torch.load(vae_path, map_location=device))
# freq_vae.to(device)
# Load all other models similarly...
# Predict on new image
img = Image.open('test_image.jpg')
img = img.resize((256, 256), Image.LANCZOS).convert('RGB')
tfm = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.485,0.456,0.406], [0.229,0.224,0.225])
])
img_tensor = tfm(img)
# Get prediction from ensemble
is_fake, score, individual_scores = ensemble.predict(img_tensor, device)
print(f"Prediction: {'FAKE' if is_fake else 'REAL'}")
print(f"Anomaly Score: {score:.4f}")
print(f"Individual model scores: {individual_scores}")
π¦ Model Files
| File | Description | Size |
|---|---|---|
freq_vae.pth |
Enhanced Frequency VAE weights | ~100 MB |
semantic_svdd.pth |
Semantic Deep SVDD weights | ~90 MB |
edge_flow.pth |
Edge Normalizing Flow weights | ~5 MB |
texture_ocsvm.pkl |
Texture One-Class SVM | ~200 MB |
iforest.pkl |
Isolation Forest | ~150 MB |
lof.pkl |
Local Outlier Factor | ~180 MB |
gmm.pkl |
Gaussian Mixture Model | ~50 MB |
color_model.pkl |
Color Distribution Model | ~10 MB |
stat.pkl |
Statistical Model | ~5 MB |
config.json |
Ensemble configuration | <1 MB |
results_summary.json |
Training metrics | <1 MB |
π§ Requirements
torch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
pillow>=9.0.0
scikit-learn>=1.3.0
scipy>=1.10.0
huggingface_hub>=0.19.0
π― Use Cases
- Deepfake Detection: Identify AI-generated faces
- Image Forensics: Detect manipulated images
- Content Moderation: Filter synthetic content
- Research: Study AI-generated image characteristics
- Quality Control: Verify image authenticity
β οΈ Limitations
- Trained on COCO real images - performance may vary on other domains
- Requires 256Γ256 input resolution
- May struggle with heavily compressed or low-quality images
- Performance depends on similarity between training and test distributions
- Not designed for adversarial attacks
π Model Improvements
This version includes several accuracy enhancements:
- Phase Information: VAE uses both magnitude and phase of FFT
- Enhanced Features: More comprehensive texture and edge features
- Adaptive Threshold: Auto-calibrated at 95th percentile
- Optimized Weights: Balanced ensemble voting
- Extended Training: Up to 45 minutes per model for better convergence
π Citation
@misc{fake-detection-ensemble-2024,
author = {ash12321},
title = {Fake Image Detection Ensemble - 9 Model System},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ash12321/fake-image-detection-ensemble}}
}
π License
MIT License - Free for research and commercial use
π Acknowledgments
- COCO Dataset for training data
- PyTorch and scikit-learn communities
- Hugging Face for model hosting
π Contact
Questions? Issues? Open an issue or discussion on this repository!
Note: This model was trained using single-class learning, making it robust to new types of fake images not seen during training. The ensemble approach combines multiple detection strategies for maximum accuracy and reliability.
- Downloads last month
- 21