AI Music DeepFake Detector

Model Description

A hybrid deep learning system that combines an autoencoder with a transformer architecture to detect AI-generated music with 95% accuracy. The model achieves 100% recall on authentic music (zero false negatives) while maintaining 90% recall on AI-generated tracks.

Key Features:

Hybrid architecture (Autoencoder + Transformer)
21.1M parameters, 80.41 MB model size
Trained on 400 balanced samples (GTZAN + Suno AI)
Zero false negatives for authentic music
Efficient mel-spectrogram based feature extraction

Model Architecture

Audio → Mel-Spectrogram → Autoencoder (Encoder + Decoder)
                              ↓
                          Latent Features → Transformer (6 layers)
                                                ↓
                                            Fusion Layer → Classifier → [Real/AI]

Components:

Autoencoder: Encoder (1→32→64→128→256 channels) + Decoder (256→128→64→32→1)
Transformer: 6 layers, 8 attention heads, 768 hidden dimensions
Classifier: 4-layer MLP (768→512→256→128→2)
Loss Function: Combined (70% classification + 30% reconstruction)

Performance

Metric	Real Music	AI Music
Precision	90.91%	100.00%
Recall	100.00%	90.00%
F1-Score	95.24%	94.74%
Overall Accuracy	95.00%

Confusion Matrix:

Real Music: 30/30 correctly classified (0 false negatives)
AI Music: 27/30 correctly classified (3 false positives)

Training Details

Hardware: NVIDIA GeForce MX450 (2.15GB VRAM)
Framework: PyTorch 2.7.1 + CUDA 11.8
Epochs: 42 (early stopping, best at epoch 27)
Optimizer: AdamW (lr=0.0001, weight_decay=1e-5)
Scheduler: CosineAnnealingLR with 5-epoch warmup
Batch Size: 32
Dataset Split: 279 train / 61 validation / 60 test

Usage

import torch
import torchaudio
import torchaudio.transforms as T
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="huzaifanasirrr/ai-music-deepfake-detector", 
                              filename="best_model.pth")

# Load model (you'll need the model architecture from the repo)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
checkpoint = torch.load(model_path, map_location=device)

# Load and preprocess audio
audio, sr = torchaudio.load("audio_file.wav")
if sr != 22050:
    resampler = T.Resample(sr, 22050)
    audio = resampler(audio)

# Extract mel-spectrogram
mel_transform = T.MelSpectrogram(
    sample_rate=22050,
    n_fft=2048,
    hop_length=512,
    n_mels=128
)
mel_spec = mel_transform(audio)

# Normalize
mel_spec = (mel_spec - mel_spec.mean()) / (mel_spec.std() + 1e-8)

# Inference
model.eval()
with torch.no_grad():
    output = model(mel_spec.unsqueeze(0).to(device))
    prediction = torch.argmax(output, dim=1).item()
    
print("Real Music" if prediction == 0 else "AI Generated")

Dataset

Training Data:

GTZAN: 200 authentic music tracks (rock, pop, classical, jazz, etc.)
Suno AI: 200 AI-generated tracks across multiple genres
Total: 400 samples, 10 seconds each, 22.05 kHz sample rate

Splits:

Training: 279 samples (70%)
Validation: 61 samples (15%)
Test: 60 samples (15%)

Model Files

best_model.pth - PyTorch checkpoint (80.41 MB)
model_architecture.json - Complete model specifications
training_summary.json - Training history (42 epochs)
training_curves.png - Loss and accuracy visualization
confusion_matrix.png - Test set results
config.yaml - Full configuration
requirements.txt - Dependencies

Limitations

Trained on 10-second audio clips (longer tracks need segmentation)
Limited to 22.05 kHz sample rate
Dataset size: 400 samples (may not generalize to all music styles)
AI music limited to Suno AI generator (may not detect other generators)
3 false positives (AI → Real) in test set

Citation

If you use this model in your research, please cite:

@software{nasir2025aimusic,
  author = {Nasir, Huzaifa},
  title = {AI Music DeepFake Detector: A Hybrid Autoencoder-Transformer Approach},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/huzaifanasirrr/ai-music-deepfake-detector},
  note = {GitHub: https://github.com/Huzaifanasir95/AI-Music-DeepFake-Detector}
}

Author

Huzaifa Nasir
National University of Computer and Emerging Sciences (FAST-NUCES), Islamabad, Pakistan
📧 nasirhuzaifa95@gmail.com
🔗 GitHub Repository

License

MIT License - See LICENSE file for details.

Acknowledgments

Research conducted at FAST-NUCES Islamabad. Inspired by recent advances in audio deepfake detection and transformer-based architectures for audio processing.

Downloads last month: 2