| | --- |
| | library_name: transformers |
| | tags: |
| | - text-classification |
| | - spam-detection |
| | - sms |
| | license: apache-2.0 |
| | --- |
| | |
| | # π‘οΈ Model Card for `alusci/distilbert-smsafe` |
| |
|
| | A lightweight DistilBERT model fine-tuned for spam detection in SMS messages. The model classifies input messages as either **spam** or **ham** (not spam), using a custom dataset of real-world OTP (One-Time Password) and spam SMS messages. |
| |
|
| | --- |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | - **Developed by:** [alusci](https://huggingface.co/alusci) |
| | - **Model type:** Transformer-based binary classifier |
| | - **Language(s):** English |
| | - **License:** Apache 2.0 |
| | - **Finetuned from model:** `distilbert-base-uncased` |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** [https://huggingface.co/alusci/distilbert-smsafe](https://huggingface.co/alusci/distilbert-smsafe) |
| |
|
| | --- |
| |
|
| | ## π οΈ Uses |
| |
|
| | ### Direct Use |
| |
|
| | - Detect whether an SMS message is spam or ham (OTP or not). |
| | - Useful in prototypes, educational settings, or lightweight filtering applications. |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | classifier = pipeline("text-classification", model="alusci/distilbert-smsafe") |
| | result = classifier("Your verification code is 123456. Please do not share it with anyone.") |
| | |
| | # Optional: map the label to human-readable terms |
| | label_map = {"LABEL_0": "ham", "LABEL_1": "spam"} |
| | print(f"Label: {label_map[result[0]['label']]} - Score: {result[0]['score']:.2f}") |
| | ``` |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | - Not intended for email spam detection or multilingual message filtering. |
| | - Not suitable for production environments without further testing and evaluation. |
| |
|
| | --- |
| |
|
| | ## π§ͺ Bias, Risks, and Limitations |
| |
|
| | - The model may reflect dataset biases (e.g., message structure, language patterns). |
| | - It may misclassify legitimate OTPs or non-standard spam content. |
| | - Risk of false positives in edge cases. |
| |
|
| | ### Recommendations |
| |
|
| | - Evaluate on your own SMS dataset before deployment. |
| | - Consider combining with rule-based or heuristic systems in production. |
| |
|
| | --- |
| |
|
| | ## π Training Details |
| |
|
| | ### Training Data |
| |
|
| | - Dataset used: [`alusci/sms-otp-spam-dataset`](https://huggingface.co/datasets/alusci/sms-otp-spam-dataset) |
| | - Binary labels for spam and non-spam OTP messages |
| |
|
| | ### Training Procedure |
| |
|
| | - **Epochs:** 5 |
| | - **Batch Size:** 16 (assumed) |
| | - **Loss Function:** CrossEntropyLoss |
| | - **Optimizer:** AdamW |
| | - **Tokenizer:** `distilbert-base-uncased` |
| |
|
| | --- |
| |
|
| | ## π Evaluation |
| |
|
| | ### Metrics |
| |
|
| | - Accuracy, Precision, Recall, F1-score on held-out validation set |
| | - Binary classification labels: |
| | - `LABEL_0` β ham |
| | - `LABEL_1` β spam |
| |
|
| | ### Results |
| |
|
| | **Evaluation metrics after 5 epochs:** |
| |
|
| | - **Loss:** 0.2962 |
| | - **Accuracy:** 91.35% |
| | - **Precision:** 90.26% |
| | - **Recall:** 100.00% |
| | - **F1-score:** 94.88% |
| |
|
| | **Performance:** |
| |
|
| | - **Evaluation runtime:** 4.37 seconds |
| | - **Samples/sec:** 457.27 |
| | - **Steps/sec:** 9.15 |
| |
|
| | --- |
| |
|
| | ## π± Environmental Impact |
| |
|
| | - **Hardware Type:** Apple Silicon MPS GPU (Mac) |
| | - **Hours used:** <1 hour (small dataset) |
| | - **Cloud Provider:** None (trained locally) |
| | - **Carbon Emitted:** Minimal due to local and efficient hardware |
| |
|
| | --- |
| |
|
| | ## π§ Technical Specifications |
| |
|
| | ### Model Architecture and Objective |
| |
|
| | - **Base:** DistilBERT |
| | - **Objective:** Binary classification head on pooled output |
| | - **Parameters:** ~66M (same as distilbert) |
| |
|
| | --- |
| |
|
| | ## π¬ Model Card Contact |
| |
|
| | For questions or feedback, please contact via [Hugging Face profile](https://huggingface.co/alusci). |