TIKTOK Bot Detection Model
Overview
This directory contains a trained Random Forest classifier for detecting bot accounts on Tiktok.
Model Version: v2
Training Date: 2025-11-27 11:38:35
Framework: scikit-learn 1.5.2
Algorithm: Random Forest Classifier with GridSearchCV Hyperparameter Tuning
π Model Performance
Final Metrics (Test Set)
| Metric |
Score |
| Accuracy |
0.9295 (92.95%) |
| Precision |
0.9330 (93.30%) |
| Recall |
0.9489 (94.89%) |
| F1-Score |
0.9408 (94.08%) |
| ROC-AUC |
0.9754 (97.54%) |
| Average Precision |
0.9820 (98.20%) |
Model Improvement
- Baseline ROC-AUC: 0.9730
- Tuned ROC-AUC: 0.9754
- Improvement: 0.0024 (0.25%)
ποΈ Files
| File |
Description |
tiktok_bot_detection_v2.pkl |
Trained Random Forest model |
tiktok_scaler_v2.pkl |
MinMaxScaler for feature normalization |
tiktok_features_v2.json |
List of features used by the model |
tiktok_metrics_v2.txt |
Detailed performance metrics report |
images/ |
All visualization plots (13 images) |
README.md |
This file |
π― Dataset Information
Training Configuration
- Training Samples: 2,385
- Test Samples: 596
- Total Samples: 2,981
- Number of Features: 12
- Cross-Validation Folds: 5
- Random State: 42
Class Distribution
Training Set:
- Human (0): 951 (39.87%)
- Bot (1): 1,434 (60.13%)
Test Set:
- Human (0): 244 (40.94%)
- Bot (1): 352 (59.06%)
π§ Features (12)
IsPrivate
IsVerified
HasProfilePic
FollowingCount
FollowerCount
HasInstagram
HasYoutube
HasBio
HasLinkInBio
HasPosts
PostsCount
FollowToFollowerRatio
π Top 5 Most Important Features
- FollowToFollowerRatio - 0.2693
- FollowerCount - 0.1753
- HasInstagram - 0.1499
- FollowingCount - 0.1236
- PostsCount - 0.1174
βοΈ Hyperparameters
Best Parameters (from GridSearchCV)
- class_weight: None
- max_depth: 13
- max_features: sqrt
- min_samples_leaf: 2
- min_samples_split: 10
- n_estimators: 100
Parameter Search Space
- n_estimators: [100, 200, 300]
- max_depth: [10, 15, 20, None]
- min_samples_split: [2, 5, 10]
- min_samples_leaf: [1, 2, 4]
- max_features: ['sqrt', 'log2']
- bootstrap: [True, False]
Total combinations tested: 540
π Cross-Validation Results
Mean Scores (5-Fold Stratified CV)
- Accuracy: 0.9191 (Β±0.0097)
- Precision: 0.9326 (Β±0.0115)
- Recall: 0.9331 (Β±0.0166)
- F1-Score: 0.9327 (Β±0.0083)
- ROC-AUC: 0.9744 (Β±0.0055)
πΌοΈ Visualizations
All visualizations are saved in the images/ directory:
- 01_class_distribution.png - Training/Test set class distribution
- 02_feature_correlation.png - Feature correlation with target variable
- 03_correlation_matrix.png - Feature correlation heatmap
- 04_baseline_confusion_matrix.png - Baseline model confusion matrix
- 05_baseline_roc_curve.png - Baseline ROC curve
- 06_baseline_precision_recall.png - Baseline Precision-Recall curve
- 07_baseline_feature_importance.png - Baseline feature importance
- 08_cross_validation.png - Cross-validation score distribution
- 09_tuned_confusion_matrix.png - Tuned model confusion matrix
- 10_tuned_roc_curve.png - Tuned ROC curve
- 11_tuned_precision_recall.png - Tuned Precision-Recall curve
- 12_tuned_feature_importance.png - Tuned feature importance
- 13_model_comparison.png - Baseline vs Tuned comparison
π Usage Example
import joblib
import pandas as pd
import numpy as np
model = joblib.load('tiktok_bot_detection_v2.pkl')
scaler = joblib.load('tiktok_scaler_v2.pkl')
data = {
'IsPrivate': 0.5,
'IsVerified': 0.5,
'HasProfilePic': 0.5,
'FollowingCount': 0.5,
'FollowerCount': 0.5,
'HasInstagram': 0.5,
'HasYoutube': 0.5,
'HasBio': 0.5,
'HasLinkInBio': 0.5,
'HasPosts': 0.5,
'PostsCount': 0.5,
'FollowToFollowerRatio': 0.5,
}
df = pd.DataFrame([data])
df_scaled = scaler.transform(df)
prediction = model.predict(df_scaled)[0]
probability = model.predict_proba(df_scaled)[0]
print(f"Prediction: {'Bot' if prediction == 1 else 'Human'}")
print(f"Bot Probability: {probability[1]:.4f}")
print(f"Human Probability: {probability[0]:.4f}")
π Confusion Matrix Breakdown
Tuned Model (Test Set)
Predicted
Human Bot
Actual Human 220 24
Bot 18 334
- True Negatives (TN): 220 (Correctly identified humans)
- False Positives (FP): 24 (Humans incorrectly classified as bots)
- False Negatives (FN): 18 (Bots incorrectly classified as humans)
- True Positives (TP): 334 (Correctly identified bots)
π Model Interpretation
Strengths
- High ROC-AUC score (0.9754) indicates excellent discrimination capability
- Balanced precision and recall for both classes
- Robust cross-validation performance
Key Insights
- Top features drive bot classification effectively
- GridSearchCV improved performance over baseline by 0.25%
- Model generalizes well on unseen test data
π Notes
- Feature Scaling: All features are scaled using MinMaxScaler to [0, 1] range
- Missing Values: Filled with 0 during preprocessing
- Class Balance: Imbalanced dataset
- Model Type: Ensemble method resistant to overfitting
π Model Updates
To retrain the model:
- Place new training data in
../data/train_tiktok.csv
- Run the training notebook:
5_enhanced_training.ipynb
- Update this README with new metrics
π§ Contact & Support
For questions or issues regarding this model, please refer to the main project documentation.
Generated: 2025-11-27 11:38:35
Notebook: 5_enhanced_training.ipynb
Platform: Tiktok