🚀 BERT-Tiny Fine-tuned for AG News Classification

Model Description

This is BERT-Tiny fine-tuned on the AG News dataset for news article classification. This ultra-lightweight model offers:

⚡ Ultra Fast: Only 4.4M parameters (25x smaller than BERT-base)
🎯 High Performance: 87.6% accuracy on AG News
🍎 MPS Optimized: Optimized for Apple Silicon GPUs
📱 Mobile Ready: Small enough for mobile deployment

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "your-username/bert-tiny-agnews"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example prediction
text = "Apple Inc. reported strong quarterly earnings..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

# Class mapping
class_names = ["World", "Sports", "Business", "Sci/Tech"]
print(f"Predicted class: {class_names[predicted_class]}")

Performance Metrics

Metric	Score
Accuracy	87.6%
F1 Score	0.8762
Improvement over base	+62.6%
Training Time	N/A
Parameters	4.4M

Training Details

Model Architecture

Base Model: prajjwal1/bert-tiny
Task: Multi-class text classification (4 classes)
Parameters: 4,386,436 (4.4M)

Training Configuration

Dataset: AG News (120,000 training samples)
Batch Size: 128 (optimized for MPS)
Learning Rate: 5e-5
Epochs: 1
Device: Apple Silicon MPS
Precision: Float32 (MPS compatible)

Dataset Classes

World - World news
Sports - Sports news
Business - Business news
Sci/Tech - Science and Technology news

Usage Examples

Classification Pipeline

from transformers import pipeline

classifier = pipeline("text-classification", model="bert-tiny-agnews")

# Single prediction
result = classifier("Tesla announces new electric vehicle model")
print(result)

# Batch predictions
texts = [
    "Olympic games start next month",
    "Stock market reaches new highs",
    "New AI breakthrough announced"
]
results = classifier(texts)
for text, result in zip(texts, results):
    print(f"Text: {text}")
    print(f"Prediction: {result['label']} ({result['score']:.3f})")

Custom Training Loop

import torch
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load your data
tokenizer = AutoTokenizer.from_pretrained("bert-tiny-agnews")
model = AutoModelForSequenceClassification.from_pretrained("bert-tiny-agnews")

# Your custom training code here...

Model Performance

Speed Benchmarks

Training Time: N/A
Inference Speed: ~1000 samples/second (MPS)
Model Size: 17MB
Memory Usage: <100MB

Accuracy by Class

The model performs well across all news categories:

Class	Precision	Recall	F1-Score
World	High	High	High
Sports	High	High	High
Business	High	High	High
Sci/Tech	High	High	High

Technical Specifications

Hardware Optimization

✅ Apple Silicon MPS: Optimized for M1/M2/M3 chips
✅ CPU Fallback: Works on any hardware
✅ Batch Processing: Efficient batch inference
✅ Memory Efficient: Low memory footprint

Software Requirements

torch>=2.0.0
transformers>=4.30.0
python>=3.8

Limitations and Bias

Domain Specific: Trained specifically on news articles
English Only: Optimized for English text
Short Text: Best performance on text <512 tokens
Bias: May reflect biases present in AG News dataset

Training Infrastructure

Device: Apple MacBook Pro (M3 Pro)
Framework: PyTorch + Transformers
Optimization: MPS acceleration
Memory Management: Unified memory architecture

Citation

@misc{bert_tiny_agnews,
  title={BERT-Tiny Fine-tuned for AG News Classification},
  author={Yang Hoyeol},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/your-username/bert-tiny-agnews}}
}

Acknowledgements

Base Model: prajjwal1/bert-tiny
Dataset: AG News
Framework: Hugging Face Transformers

Model trained with love using Apple Silicon MPS optimization 🍎⚡

Downloads last month: 2

Safetensors

Model size

4.39M params

Tensor type

F32

Model tree for HoYeolY/bert-tiny-finetuned-agnews

Base model

prajjwal1/bert-tiny

Finetuned

(78)

this model

Dataset used to train HoYeolY/bert-tiny-finetuned-agnews

Evaluation results

Accuracy on AG News
self-reported

0.876
F1 Score on AG News
self-reported

0.876

View on Papers With Code