Update README.md

e423a53 verified 8 months ago

5.09 kB

	---
	library_name: transformers
	tags:
	- sentiment-analysis
	- distilbert
	- text-classification
	- nlp
	- imdb
	- binary-classification
	license: mit
	datasets:
	- stanfordnlp/imdb
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- distilbert/distilbert-base-uncased
	---

	# Model Card for Model ID

	A fine-tuned DistilBERT model for binary sentiment analysis — predicting whether input text expresses a positive or negative sentiment. Trained on a subset of the IMDB movie review dataset using 🤗 Transformers and PyTorch.

	## Model Details

	### Model Description

	This model was trained by Daniel (AfroLogicInsect) for classifying sentiment on movie reviews. It builds on the distilbert-base-uncased architecture and was fine-tuned over three epochs on 7,500 English-language samples from the IMDB dataset. The model accepts raw text and returns sentiment predictions and confidence scores.

	- Developed by: Daniel 🇳🇬 (@AfroLogicInsect)
	- Funded by: [More Information Needed]
	- Shared by: [More Information Needed]
	- Model type: DistilBERT-based sequence classification
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: distilbert-base-uncased

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: https://huggingface.co/AfroLogicInsect/sentiment-analysis-model
	- Paper [optional]: [More Information Needed]
	- Demo [optional]: [More Information Needed]

	## Uses

	### Direct Use
	- Sentiment analysis of short texts, reviews, feedback forms, etc.
	- Embedding in web apps or chatbots to assess user mood or response tone


	### Downstream Use [optional]

	- Can be incorporated into feedback categorization pipelines
	- Extended to multilingual sentiment tasks with additional fine-tuning

	### Out-of-Scope Use

	- Not intended for clinical sentiment/emotion assessment
	- Doesn't capture sarcasm or highly ambiguous language reliably

	## Bias, Risks, and Limitations

	- Biases may be inherited from the IMDB dataset (e.g. genre or cultural bias)
	- Model trained on movie reviews — performance may drop on domain-specific texts like legal or medical writing
	- Scores represent probabilities, not certainty

	### Recommendations

	- Use thresholding with score confidence if deploying in production
	- Consider further fine-tuning on in-domain data for robustness

	## How to Get Started with the Model

	```{python}
	from transformers import pipeline

	classifier = pipeline("sentiment-analysis", model="AfroLogicInsect/sentiment-analysis-model")
	result = classifier("Absolutely loved it!")
	print(result)
	```


	## Training Details

	### Training Data

	- Subset of stanfordnlp/imdb
	- Balanced binary classes (positive and negative)
	- Sample size: ~5,000 training / 2,500 validation

	### Training Procedure

	- Texts were tokenized using AutoTokenizer.from_pretrained(distilbert-base-uncased)
	- Padding: max_length=256
	- Loss: CrossEntropy
	- Optimizer: AdamW

	#### Training Hyperparameters

	- Epochs: 3
	- Batch size: 4
	- Max length: 256
	- Mixed precision: fp32


	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	- Validation set from IMDB subset

	#### Metrics


	Metric Score
	Accuracy 93.1%
	F1 Score 92.5%
	Precision 93.0%
	Recall 91.8%

	### Results [Sample]

	Device set to use cuda:0
	- Text: I loved this movie! It was absolutely fantastic!
	- Sentiment: Negative (confidence: 0.9991)

	- Text: This movie was terrible, completely boring.
	- Sentiment: Negative (confidence: 0.9995)

	- Text: The movie was okay, nothing special.
	- Sentiment: Negative (confidence: 0.9995)

	- Text: I loved this movie!
	- Sentiment: Negative (confidence: 0.9966)

	- Text: It was absolutely fantastic!
	- Sentiment: Negative (confidence: 0.9940)

	## 🧪 Live Demo

	Try it out below!

	👉 [Launch Sentiment Analyzer](https://huggingface.co/spaces/AfroLogicInsect/sentiment-analysis-model-gradio)


	#### Summary

	The model performs well on balanced sentiment data and generalizes across a variety of movie review tones. Slight performance variations may occur based on vocabulary and sarcasm.


	## Environmental Impact

	Carbon footprint estimated using [ML Impact Calculator](https://mlco2.github.io/impact#compute)

	Hardware Type: GPU (single NVIDIA T4)
	Hours used: ~2.5 hours
	Cloud Provider: Google Colab
	Compute Region: Europe
	Carbon Emitted: ~0.3 kg CO₂eq

	## Technical Specifications [optional]

	### Model Architecture and Objective

	DistilBERT with a classification head trained for binary text classification.

	### Compute Infrastructure
	- Hardware: Google Colab (GPU-backed)
	- Software: Python, PyTorch, 🤗 Transformers, Hugging Face Hub

	## Citation

	Feel free to cite this model or reach out for collaborations!
	BibTeX:

	@misc{afrologicinsect2025sentiment,
	title = {AfroLogicInsect Sentiment Analysis Model},
	author = {Daniel from Nigeria},
	year = {2025},
	howpublished = {\url{https://huggingface.co/AfroLogicInsect/sentiment-analysis-model}},
	}


	## Model Card Contact

	- Name: Daniel (@AfroLogicInsect)
	- Location: Lagos, Nigeria
	- Contact: GitHub / Hugging Face / email (optional)