shreyaspulle98's picture
Upload README.md with huggingface_hub
ab045cd verified
---
license: mit
language:
- en
library_name: keras
tags:
- emotion-classification
- sentiment-analysis
- text-classification
- nlp
- keras
- tensorflow
- word2vec
pipeline_tag: text-classification
datasets:
- shreyaspulle98/emotion-dataset-20-emotions
---
# 20-Emotion Text Classification Model
A deep learning model for fine-grained emotion classification that can detect 20 distinct emotions in text with high accuracy.
## Model Description
This model uses a combination of **Word2Vec embeddings** and a **Neural Network classifier** to identify emotions in text. Unlike simple sentiment analysis (positive/negative), this model can distinguish between 20 different emotional states, providing nuanced understanding of emotional content.
### Architecture
- **Embedding Layer**: Word2Vec (100-dimensional vectors)
- Trained on 79,595 emotion-labeled sentences
- Optimized model size: 2.9MB
- **Classifier**: Feedforward Neural Network
- Input: Sentence embeddings (mean-pooled word vectors)
- Hidden layers with dropout for regularization
- Output: 20-class softmax classification
- Model size: 111KB
### 20 Emotions Detected
The model can classify text into these 20 emotions:
1. Happiness
2. Sadness
3. Fear
4. Anger
5. Disgust
6. Surprise
7. Love
8. Excitement
9. Embarrassment
10. Loneliness
11. Anxiety
12. Frustration
13. Guilt
14. Disappointment
15. Jealousy
16. Gratitude
17. Pride
18. Relief
19. Hope
20. Confusion
## Usage
### Installation
```bash
pip install tensorflow gensim nltk numpy scikit-learn
```
### Quick Start
```python
import numpy as np
from tensorflow import keras
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize
import pickle
import re
from huggingface_hub import hf_hub_download
# Download model files
model_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
filename="best_model.keras")
w2v_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
filename="word2vec_optimized.model")
encoder_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
filename="label_encoder.pkl")
# Load models
w2v_model = Word2Vec.load(w2v_path)
classifier = keras.models.load_model(model_path, compile=False)
with open(encoder_path, 'rb') as f:
label_encoder = pickle.load(f)
# Preprocessing function
def preprocess_text(text):
text = str(text).lower()
text = re.sub(r'http\S+|www\S+|https\S+', '', text)
text = re.sub(r'@\w+', '', text)
text = re.sub(r'#\w+', '', text)
harmful_punctuation = '"#$%&()*+-/:;<=>@[\\]^_`{|}~'
text = text.translate(str.maketrans('', '', harmful_punctuation))
text = re.sub(r'\s+', ' ', text).strip()
return text
# Sentence to vector
def sentence_to_vector(sentence, w2v_model):
words = word_tokenize(sentence.lower())
word_vectors = [w2v_model.wv[word] for word in words if word in w2v_model.wv]
if len(word_vectors) == 0:
return np.zeros(w2v_model.wv.vector_size)
return np.mean(word_vectors, axis=0)
# Prediction function
def predict_emotion(text, top_k=5):
# Preprocess
cleaned = preprocess_text(text)
# Convert to vector
vector = sentence_to_vector(cleaned, w2v_model).reshape(1, -1)
# Predict
probs = classifier.predict(vector, verbose=0)[0]
# Get top-k predictions
top_indices = np.argsort(probs)[-top_k:][::-1]
results = []
for idx in top_indices:
emotion = label_encoder.inverse_transform([idx])[0]
confidence = float(probs[idx])
results.append({
'emotion': emotion,
'confidence': confidence,
'percentage': round(confidence * 100, 1)
})
return results
# Example usage
text = "I'm so excited about this amazing opportunity!"
predictions = predict_emotion(text)
print(f"Text: {text}")
print("\nTop predictions:")
for pred in predictions:
print(f" {pred['emotion']}: {pred['percentage']}%")
```
### Output Example
```
Text: I'm so excited about this amazing opportunity!
Top predictions:
excitement: 78.5%
happiness: 12.3%
hope: 4.2%
gratitude: 2.8%
pride: 2.2%
```
## Training Data
This model was trained on the [emotion-dataset-20-emotions](https://huggingface.co/datasets/shreyaspulle98/emotion-dataset-20-emotions) dataset, which contains:
- **79,595 sentences** with emotion labels
- **20 balanced emotion categories**
- Synthetically generated using advanced language models
- Cleaned and preprocessed text
## Performance
The model achieves strong performance across all 20 emotion categories:
- **Training accuracy**: ~95%
- **Balanced emotion distribution**: Each emotion well-represented
- **Fast inference**: < 100ms per prediction on CPU
### Strengths
- Can distinguish between subtle emotional differences (e.g., anxiety vs. fear, disappointment vs. sadness)
- Works well with everyday conversational language
- Lightweight and fast inference
- No external API calls required
### Limitations
- **English only**: Currently supports only English text
- **Synthetic training data**: May not capture all real-world emotional expressions
- **Single emotion**: Assigns one primary emotion (though provides confidence scores for others)
- **Context-dependent**: May struggle with sarcasm, irony, or culturally-specific expressions
- **Short text optimized**: Best performance on sentence-level text (10-50 words)
## Use Cases
This model is ideal for:
- **Mental Health Apps**: Detect emotional states in user journals or messages
- **Customer Service**: Analyze customer sentiment in support tickets and feedback
- **Social Media Analytics**: Understand emotional tone of posts and comments
- **Chatbots**: Enable emotion-aware conversational AI
- **Content Moderation**: Flag content expressing concerning emotions
- **UX Research**: Analyze user feedback and reviews for emotional insights
- **Educational Tools**: Help students identify and understand emotions in text
## Model Files
- **best_model.keras** (111KB): Neural network classifier
- **word2vec_optimized.model** (2.9MB): Word2Vec embeddings
- **label_encoder.pkl** (457B): Label encoder for emotion categories
## Technical Details
### Preprocessing Pipeline
1. Lowercase conversion
2. URL removal
3. Mention/hashtag removal
4. Special character removal
5. Whitespace normalization
### Inference Pipeline
1. Text preprocessing
2. Tokenization (NLTK word_tokenize)
3. Word vector lookup
4. Mean pooling of word vectors
5. Neural network classification
6. Softmax probability output
### Dependencies
```txt
tensorflow>=2.13.0
gensim>=4.3.0
nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.3.0
```
## Ethical Considerations
### Responsible Use
- This model should **complement, not replace** human judgment in sensitive applications
- Emotion detection has limitations and may not always be accurate
- Consider privacy implications when analyzing personal communications
- Be aware of potential biases in synthetic training data
### Not Recommended For
- Clinical mental health diagnosis
- Legal or law enforcement decisions
- Employment decisions
- Automated content removal without human review
### Bias Considerations
- The model was trained on synthetically generated data, which may not represent all demographic groups equally
- Emotional expression varies across cultures, age groups, and contexts
- The model may perform differently on various writing styles and dialects
## Citation
If you use this model in your research or applications, please cite:
```bibtex
@model{emotion_classifier_20_2025,
author = {Shreyas Pulle},
title = {20-Emotion Text Classification Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/shreyaspulle98/emotion-classifier-20-emotions}
}
```
## Dataset
The training dataset is available at:
[shreyaspulle98/emotion-dataset-20-emotions](https://huggingface.co/datasets/shreyaspulle98/emotion-dataset-20-emotions)
## License
This model is released under the MIT License. You are free to use, modify, and distribute this model for commercial and non-commercial purposes.
## Contact
- HuggingFace: [@shreyaspulle98](https://huggingface.co/shreyaspulle98)
- Model Repository: [emotion-classifier-20-emotions](https://huggingface.co/shreyaspulle98/emotion-classifier-20-emotions)
## Acknowledgments
- Training data generated using DeepInfra API
- Built with TensorFlow/Keras and Gensim
- Inspired by advances in emotion AI and affective computing
---
**Try it out!** Test the model with your own text and explore the 20 emotions it can detect.