File size: 1,237 Bytes
f6da0d0
 
2602703
 
 
 
 
 
 
 
 
 
 
 
 
f6da0d0
 
2602703
f6da0d0
2602703
f6da0d0
7341e34
2602703
f6da0d0
2602703
f6da0d0
2602703
f6da0d0
2602703
 
 
 
 
 
 
f6da0d0
2602703
f6da0d0
 
2602703
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
library_name: transformers
tags:
- legal
datasets:
- ealvaradob/phishing-dataset
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- distilbert/distilbert-base-uncased
---

# 📧 distilbert-finetuned-phishing

A fine-tuned `distilbert-base-uncased` model for phishing email classification. This model is designed to distinguish between **safe** and **phishing** emails using natural language content.

[Colab Notebook](https://colab.research.google.com/drive/1_M5BVn9agRHUSN3wBPebfxfOpBqTJcwh?usp=sharing) 
---

## 🧪 Evaluation Results

The model was trained on 77,677 emails and evaluated with the following results:

| Metric        | Value   |
|---------------|---------|
| Accuracy      | 0.9639 |
| Precision     | 0.9648 |
| Recall        | 0.9489 |
| F1 Score      | 0.9568 |
| Eval Loss     | 0.1326 |

---


### ⚙️ Training Configuration
TrainingArguments(
    output_dir="./hf-phishing-model",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    load_best_model_at_end=True,
    fp16=torch.cuda.is_available(),
)