π BERT-Tiny Fine-tuned for AG News Classification
Model Description
This is BERT-Tiny fine-tuned on the AG News dataset for news article classification. This ultra-lightweight model offers:
- β‘ Ultra Fast: Only 4.4M parameters (25x smaller than BERT-base)
- π― High Performance: 87.6% accuracy on AG News
- π MPS Optimized: Optimized for Apple Silicon GPUs
- π± Mobile Ready: Small enough for mobile deployment
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "your-username/bert-tiny-agnews"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example prediction
text = "Apple Inc. reported strong quarterly earnings..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
# Class mapping
class_names = ["World", "Sports", "Business", "Sci/Tech"]
print(f"Predicted class: {class_names[predicted_class]}")
Performance Metrics
| Metric | Score |
|---|---|
| Accuracy | 87.6% |
| F1 Score | 0.8762 |
| Improvement over base | +62.6% |
| Training Time | N/A |
| Parameters | 4.4M |
Training Details
Model Architecture
- Base Model: prajjwal1/bert-tiny
- Task: Multi-class text classification (4 classes)
- Parameters: 4,386,436 (4.4M)
Training Configuration
- Dataset: AG News (120,000 training samples)
- Batch Size: 128 (optimized for MPS)
- Learning Rate: 5e-5
- Epochs: 1
- Device: Apple Silicon MPS
- Precision: Float32 (MPS compatible)
Dataset Classes
- World - World news
- Sports - Sports news
- Business - Business news
- Sci/Tech - Science and Technology news
Usage Examples
Classification Pipeline
from transformers import pipeline
classifier = pipeline("text-classification", model="bert-tiny-agnews")
# Single prediction
result = classifier("Tesla announces new electric vehicle model")
print(result)
# Batch predictions
texts = [
"Olympic games start next month",
"Stock market reaches new highs",
"New AI breakthrough announced"
]
results = classifier(texts)
for text, result in zip(texts, results):
print(f"Text: {text}")
print(f"Prediction: {result['label']} ({result['score']:.3f})")
Custom Training Loop
import torch
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load your data
tokenizer = AutoTokenizer.from_pretrained("bert-tiny-agnews")
model = AutoModelForSequenceClassification.from_pretrained("bert-tiny-agnews")
# Your custom training code here...
Model Performance
Speed Benchmarks
- Training Time: N/A
- Inference Speed: ~1000 samples/second (MPS)
- Model Size: 17MB
- Memory Usage: <100MB
Accuracy by Class
The model performs well across all news categories:
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| World | High | High | High |
| Sports | High | High | High |
| Business | High | High | High |
| Sci/Tech | High | High | High |
Technical Specifications
Hardware Optimization
- β Apple Silicon MPS: Optimized for M1/M2/M3 chips
- β CPU Fallback: Works on any hardware
- β Batch Processing: Efficient batch inference
- β Memory Efficient: Low memory footprint
Software Requirements
torch>=2.0.0
transformers>=4.30.0
python>=3.8
Limitations and Bias
- Domain Specific: Trained specifically on news articles
- English Only: Optimized for English text
- Short Text: Best performance on text <512 tokens
- Bias: May reflect biases present in AG News dataset
Training Infrastructure
- Device: Apple MacBook Pro (M3 Pro)
- Framework: PyTorch + Transformers
- Optimization: MPS acceleration
- Memory Management: Unified memory architecture
Citation
@misc{bert_tiny_agnews,
title={BERT-Tiny Fine-tuned for AG News Classification},
author={Yang Hoyeol},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/your-username/bert-tiny-agnews}}
}
Acknowledgements
- Base Model: prajjwal1/bert-tiny
- Dataset: AG News
- Framework: Hugging Face Transformers
Model trained with love using Apple Silicon MPS optimization πβ‘
- Downloads last month
- 2
Model tree for HoYeolY/bert-tiny-finetuned-agnews
Base model
prajjwal1/bert-tinyDataset used to train HoYeolY/bert-tiny-finetuned-agnews
Evaluation results
- Accuracy on AG Newsself-reported0.876
- F1 Score on AG Newsself-reported0.876