---
language: en
tags:
- catboost
- regression
- machine-learning
- tabular-data
- gradient-boosting
library_name: catboost
widget:
- tabular:
    example_title: "Sample Prediction"
    data:
      initially_infected: [4, 6]
      lowest_immunity: [0.2, 0.1]
      highest_immunity: [0.75, 0.75]
      mask_beta_penalty: [0.5, 0.5]
      pollutant_immunity_reduction: [0.2, 0.1]
---

# Agentic Disease Spread CatBoost Regressor Model for Pollutant effects with Beta

## Model Description

This is a CatBoost Regressor model trained for regression tasks on tabular data
created by simulations from [Agent-based Implementations for Infectious Disease Transmission Models](https://github.com/AlekseiAgarkov/AgenticInfectiousDiseaseTransmissionModels)
simulator.
CatBoost (Categorical Boosting) is a gradient boosting library developed
by Yandex that excels at handling categorical features natively without extensive preprocessing.

- **Model type:** Gradient Boosting Decision Trees
- **Task:** Regression
- **License:** MIT
- **Repository:** https://github.com/AlekseiAgarkov/AgenticInfectiousDiseaseTransmissionModels

## Intended Uses & Limitations

### Intended Use
- Regression analysis on structured/tabular disease spread agentic simulations data
- Scenarios with pollutant effects

### Limitations
- Primarily designed for pollutant effects checking
- Not suitable for unstructured data (images, text, audio)

## How to Use

### Installation
```bash
pip install catboost
```

### Basic Usage
```python
import pickle
import pandas as pd
from catboost import CatBoostRegressor

# Load the model
with open('catboost_model.pkl', 'rb') as f:
    model = pickle.load(f)

# Prepare your data (as pandas DataFrame)
# Ensure features match training data format
data = pd.DataFrame({
    'beta': [value0],
    'initially_infected': [value1],
    'lowest_immunity': [value2],
    'highest_immunity': [value3],
    'mask_beta_penalty': [value4],
    'pollutant_immunity_reduction': [value5]
})

# Make prediction
prediction = model.predict(data)
```

### Using with CatBoost directly
```python
from catboost import CatBoostRegressor

# Load saved model
model = CatBoostRegressor()
model.load_model('catboost_model.cbm')

# Make predictions
predictions = model.predict(data)
```

## Training Procedure

### Training Data
Data details:
- Source: https://raw.githubusercontent.com/AlekseiAgarkov/MIFIML-2-Sem1-M25-525-Project-Practice/refs/heads/main/data/sim_data_metrics_20251214.csv
- Features:
  - beta: float - infectivity coefficient (`beta`)
  - initially_infected: int - number of initially infected agents
  - lowest_immunity: float - lowest possible immunity in simulation
  - highest_immunity: float - highest possible immunity in simulation
  - mask_beta_penalty: float - beta reduction coefficient for a mask weared at contact
  - pollutant_immunity_reduction: float - immunity reduction coefficient for pollutant
- Target variable: 'infected_90d'
- Samples: 2000
- Preprocessing: None

### Training Hyperparameters
```yaml
iterations: 10000
learning_rate: 0.025
depth: 5
loss_function: 'RMSE'
cat_features: None
verbose: False
early_stopping_rounds: 500
random_seed: 42
```

### Evaluation Results

|       Metric      |  Value |
|-------------------|--------|
| Train RMSE        | 476.41 |
| Validation RMSE   | 535.55 |

## Feature Information

|            Feature Name      |   Type  |              Description                                | Importance |
|------------------------------|---------|---------------------------------------------------------|------------|
| beta                         | Numeric | infectivity coefficient (`beta`)                        | 80.79      |
| initially_infected           | Numeric | number of initially infected agents                     | 17.94      |
| lowest_immunity              | Numeric | lowest possible immunity in simulation                  | 0.17       |
| highest_immunity             | Numeric | highest possible immunity in simulation                 | 0.42       |
| mask_beta_penalty            | Numeric | beta reduction coefficient for a mask weared at contact | 0.53       |
| pollutant_immunity_reduction | Numeric | immunity reduction coefficient for pollutant            | 0.15       |

## Model Architecture

- **Algorithm:** Gradient Boosting on Decision Trees
- **Number of trees:** 188
- **Tree depth:** 5
- **Learning rate:** 0.025
- **Loss function:** RMSE
- **Feature importance type:** default

## Model Card Authors
Aleksei Agarkov / MEPhI

## Model Card Contact
agarkov.aleksei1@yandex.ru

## Disclaimer

This model is provided "as is" without warranty of any kind. Users should evaluate the model's suitability for their specific use case and perform appropriate testing before deployment in production environments.