TIKTOK Bot Detection Model

Overview

This directory contains a trained Random Forest classifier for detecting bot accounts on Tiktok.

Model Version: v2 Training Date: 2025-11-27 11:38:35 Framework: scikit-learn 1.5.2 Algorithm: Random Forest Classifier with GridSearchCV Hyperparameter Tuning


πŸ“Š Model Performance

Final Metrics (Test Set)

Metric Score
Accuracy 0.9295 (92.95%)
Precision 0.9330 (93.30%)
Recall 0.9489 (94.89%)
F1-Score 0.9408 (94.08%)
ROC-AUC 0.9754 (97.54%)
Average Precision 0.9820 (98.20%)

Model Improvement

  • Baseline ROC-AUC: 0.9730
  • Tuned ROC-AUC: 0.9754
  • Improvement: 0.0024 (0.25%)

πŸ—‚οΈ Files

File Description
tiktok_bot_detection_v2.pkl Trained Random Forest model
tiktok_scaler_v2.pkl MinMaxScaler for feature normalization
tiktok_features_v2.json List of features used by the model
tiktok_metrics_v2.txt Detailed performance metrics report
images/ All visualization plots (13 images)
README.md This file

🎯 Dataset Information

Training Configuration

  • Training Samples: 2,385
  • Test Samples: 596
  • Total Samples: 2,981
  • Number of Features: 12
  • Cross-Validation Folds: 5
  • Random State: 42

Class Distribution

Training Set:

  • Human (0): 951 (39.87%)
  • Bot (1): 1,434 (60.13%)

Test Set:

  • Human (0): 244 (40.94%)
  • Bot (1): 352 (59.06%)

πŸ”§ Features (12)

  1. IsPrivate
  2. IsVerified
  3. HasProfilePic
  4. FollowingCount
  5. FollowerCount
  6. HasInstagram
  7. HasYoutube
  8. HasBio
  9. HasLinkInBio
  10. HasPosts
  11. PostsCount
  12. FollowToFollowerRatio

πŸ† Top 5 Most Important Features

  1. FollowToFollowerRatio - 0.2693
  2. FollowerCount - 0.1753
  3. HasInstagram - 0.1499
  4. FollowingCount - 0.1236
  5. PostsCount - 0.1174

βš™οΈ Hyperparameters

Best Parameters (from GridSearchCV)

  • class_weight: None
  • max_depth: 13
  • max_features: sqrt
  • min_samples_leaf: 2
  • min_samples_split: 10
  • n_estimators: 100

Parameter Search Space

  • n_estimators: [100, 200, 300]
  • max_depth: [10, 15, 20, None]
  • min_samples_split: [2, 5, 10]
  • min_samples_leaf: [1, 2, 4]
  • max_features: ['sqrt', 'log2']
  • bootstrap: [True, False]

Total combinations tested: 540


πŸ“ˆ Cross-Validation Results

Mean Scores (5-Fold Stratified CV)

  • Accuracy: 0.9191 (Β±0.0097)
  • Precision: 0.9326 (Β±0.0115)
  • Recall: 0.9331 (Β±0.0166)
  • F1-Score: 0.9327 (Β±0.0083)
  • ROC-AUC: 0.9744 (Β±0.0055)

πŸ–ΌοΈ Visualizations

All visualizations are saved in the images/ directory:

  1. 01_class_distribution.png - Training/Test set class distribution
  2. 02_feature_correlation.png - Feature correlation with target variable
  3. 03_correlation_matrix.png - Feature correlation heatmap
  4. 04_baseline_confusion_matrix.png - Baseline model confusion matrix
  5. 05_baseline_roc_curve.png - Baseline ROC curve
  6. 06_baseline_precision_recall.png - Baseline Precision-Recall curve
  7. 07_baseline_feature_importance.png - Baseline feature importance
  8. 08_cross_validation.png - Cross-validation score distribution
  9. 09_tuned_confusion_matrix.png - Tuned model confusion matrix
  10. 10_tuned_roc_curve.png - Tuned ROC curve
  11. 11_tuned_precision_recall.png - Tuned Precision-Recall curve
  12. 12_tuned_feature_importance.png - Tuned feature importance
  13. 13_model_comparison.png - Baseline vs Tuned comparison

πŸš€ Usage Example

import joblib
import pandas as pd
import numpy as np

# Load model and scaler
model = joblib.load('tiktok_bot_detection_v2.pkl')
scaler = joblib.load('tiktok_scaler_v2.pkl')

# Prepare your data (example)
data = {
    'IsPrivate': 0.5,
    'IsVerified': 0.5,
    'HasProfilePic': 0.5,
    'FollowingCount': 0.5,
    'FollowerCount': 0.5,
    'HasInstagram': 0.5,
    'HasYoutube': 0.5,
    'HasBio': 0.5,
    'HasLinkInBio': 0.5,
    'HasPosts': 0.5,
    'PostsCount': 0.5,
    'FollowToFollowerRatio': 0.5,
}

# Create DataFrame
df = pd.DataFrame([data])

# Scale features
df_scaled = scaler.transform(df)

# Predict
prediction = model.predict(df_scaled)[0]
probability = model.predict_proba(df_scaled)[0]

print(f"Prediction: {'Bot' if prediction == 1 else 'Human'}")
print(f"Bot Probability: {probability[1]:.4f}")
print(f"Human Probability: {probability[0]:.4f}")

πŸ“‹ Confusion Matrix Breakdown

Tuned Model (Test Set)

                Predicted
              Human    Bot
Actual Human      220      24
       Bot         18     334
  • True Negatives (TN): 220 (Correctly identified humans)
  • False Positives (FP): 24 (Humans incorrectly classified as bots)
  • False Negatives (FN): 18 (Bots incorrectly classified as humans)
  • True Positives (TP): 334 (Correctly identified bots)

πŸ” Model Interpretation

Strengths

  • High ROC-AUC score (0.9754) indicates excellent discrimination capability
  • Balanced precision and recall for both classes
  • Robust cross-validation performance

Key Insights

  1. Top features drive bot classification effectively
  2. GridSearchCV improved performance over baseline by 0.25%
  3. Model generalizes well on unseen test data

πŸ“ Notes

  • Feature Scaling: All features are scaled using MinMaxScaler to [0, 1] range
  • Missing Values: Filled with 0 during preprocessing
  • Class Balance: Imbalanced dataset
  • Model Type: Ensemble method resistant to overfitting

πŸ”„ Model Updates

To retrain the model:

  1. Place new training data in ../data/train_tiktok.csv
  2. Run the training notebook: 5_enhanced_training.ipynb
  3. Update this README with new metrics

πŸ“§ Contact & Support

For questions or issues regarding this model, please refer to the main project documentation.


Generated: 2025-11-27 11:38:35 Notebook: 5_enhanced_training.ipynb Platform: Tiktok

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including nahiar/tiktok-bot-detection