|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
library_name: keras |
|
|
tags: |
|
|
- emotion-classification |
|
|
- sentiment-analysis |
|
|
- text-classification |
|
|
- nlp |
|
|
- keras |
|
|
- tensorflow |
|
|
- word2vec |
|
|
pipeline_tag: text-classification |
|
|
datasets: |
|
|
- shreyaspulle98/emotion-dataset-20-emotions |
|
|
--- |
|
|
|
|
|
# 20-Emotion Text Classification Model |
|
|
|
|
|
A deep learning model for fine-grained emotion classification that can detect 20 distinct emotions in text with high accuracy. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model uses a combination of **Word2Vec embeddings** and a **Neural Network classifier** to identify emotions in text. Unlike simple sentiment analysis (positive/negative), this model can distinguish between 20 different emotional states, providing nuanced understanding of emotional content. |
|
|
|
|
|
### Architecture |
|
|
|
|
|
- **Embedding Layer**: Word2Vec (100-dimensional vectors) |
|
|
- Trained on 79,595 emotion-labeled sentences |
|
|
- Optimized model size: 2.9MB |
|
|
|
|
|
- **Classifier**: Feedforward Neural Network |
|
|
- Input: Sentence embeddings (mean-pooled word vectors) |
|
|
- Hidden layers with dropout for regularization |
|
|
- Output: 20-class softmax classification |
|
|
- Model size: 111KB |
|
|
|
|
|
### 20 Emotions Detected |
|
|
|
|
|
The model can classify text into these 20 emotions: |
|
|
|
|
|
1. Happiness |
|
|
2. Sadness |
|
|
3. Fear |
|
|
4. Anger |
|
|
5. Disgust |
|
|
6. Surprise |
|
|
7. Love |
|
|
8. Excitement |
|
|
9. Embarrassment |
|
|
10. Loneliness |
|
|
11. Anxiety |
|
|
12. Frustration |
|
|
13. Guilt |
|
|
14. Disappointment |
|
|
15. Jealousy |
|
|
16. Gratitude |
|
|
17. Pride |
|
|
18. Relief |
|
|
19. Hope |
|
|
20. Confusion |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install tensorflow gensim nltk numpy scikit-learn |
|
|
``` |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
import numpy as np |
|
|
from tensorflow import keras |
|
|
from gensim.models import Word2Vec |
|
|
from nltk.tokenize import word_tokenize |
|
|
import pickle |
|
|
import re |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download model files |
|
|
model_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions", |
|
|
filename="best_model.keras") |
|
|
w2v_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions", |
|
|
filename="word2vec_optimized.model") |
|
|
encoder_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions", |
|
|
filename="label_encoder.pkl") |
|
|
|
|
|
# Load models |
|
|
w2v_model = Word2Vec.load(w2v_path) |
|
|
classifier = keras.models.load_model(model_path, compile=False) |
|
|
with open(encoder_path, 'rb') as f: |
|
|
label_encoder = pickle.load(f) |
|
|
|
|
|
# Preprocessing function |
|
|
def preprocess_text(text): |
|
|
text = str(text).lower() |
|
|
text = re.sub(r'http\S+|www\S+|https\S+', '', text) |
|
|
text = re.sub(r'@\w+', '', text) |
|
|
text = re.sub(r'#\w+', '', text) |
|
|
harmful_punctuation = '"#$%&()*+-/:;<=>@[\\]^_`{|}~' |
|
|
text = text.translate(str.maketrans('', '', harmful_punctuation)) |
|
|
text = re.sub(r'\s+', ' ', text).strip() |
|
|
return text |
|
|
|
|
|
# Sentence to vector |
|
|
def sentence_to_vector(sentence, w2v_model): |
|
|
words = word_tokenize(sentence.lower()) |
|
|
word_vectors = [w2v_model.wv[word] for word in words if word in w2v_model.wv] |
|
|
if len(word_vectors) == 0: |
|
|
return np.zeros(w2v_model.wv.vector_size) |
|
|
return np.mean(word_vectors, axis=0) |
|
|
|
|
|
# Prediction function |
|
|
def predict_emotion(text, top_k=5): |
|
|
# Preprocess |
|
|
cleaned = preprocess_text(text) |
|
|
|
|
|
# Convert to vector |
|
|
vector = sentence_to_vector(cleaned, w2v_model).reshape(1, -1) |
|
|
|
|
|
# Predict |
|
|
probs = classifier.predict(vector, verbose=0)[0] |
|
|
|
|
|
# Get top-k predictions |
|
|
top_indices = np.argsort(probs)[-top_k:][::-1] |
|
|
|
|
|
results = [] |
|
|
for idx in top_indices: |
|
|
emotion = label_encoder.inverse_transform([idx])[0] |
|
|
confidence = float(probs[idx]) |
|
|
results.append({ |
|
|
'emotion': emotion, |
|
|
'confidence': confidence, |
|
|
'percentage': round(confidence * 100, 1) |
|
|
}) |
|
|
|
|
|
return results |
|
|
|
|
|
# Example usage |
|
|
text = "I'm so excited about this amazing opportunity!" |
|
|
predictions = predict_emotion(text) |
|
|
|
|
|
print(f"Text: {text}") |
|
|
print("\nTop predictions:") |
|
|
for pred in predictions: |
|
|
print(f" {pred['emotion']}: {pred['percentage']}%") |
|
|
``` |
|
|
|
|
|
### Output Example |
|
|
|
|
|
``` |
|
|
Text: I'm so excited about this amazing opportunity! |
|
|
|
|
|
Top predictions: |
|
|
excitement: 78.5% |
|
|
happiness: 12.3% |
|
|
hope: 4.2% |
|
|
gratitude: 2.8% |
|
|
pride: 2.2% |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
This model was trained on the [emotion-dataset-20-emotions](https://huggingface.co/datasets/shreyaspulle98/emotion-dataset-20-emotions) dataset, which contains: |
|
|
|
|
|
- **79,595 sentences** with emotion labels |
|
|
- **20 balanced emotion categories** |
|
|
- Synthetically generated using advanced language models |
|
|
- Cleaned and preprocessed text |
|
|
|
|
|
## Performance |
|
|
|
|
|
The model achieves strong performance across all 20 emotion categories: |
|
|
|
|
|
- **Training accuracy**: ~95% |
|
|
- **Balanced emotion distribution**: Each emotion well-represented |
|
|
- **Fast inference**: < 100ms per prediction on CPU |
|
|
|
|
|
### Strengths |
|
|
|
|
|
- Can distinguish between subtle emotional differences (e.g., anxiety vs. fear, disappointment vs. sadness) |
|
|
- Works well with everyday conversational language |
|
|
- Lightweight and fast inference |
|
|
- No external API calls required |
|
|
|
|
|
### Limitations |
|
|
|
|
|
- **English only**: Currently supports only English text |
|
|
- **Synthetic training data**: May not capture all real-world emotional expressions |
|
|
- **Single emotion**: Assigns one primary emotion (though provides confidence scores for others) |
|
|
- **Context-dependent**: May struggle with sarcasm, irony, or culturally-specific expressions |
|
|
- **Short text optimized**: Best performance on sentence-level text (10-50 words) |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
This model is ideal for: |
|
|
|
|
|
- **Mental Health Apps**: Detect emotional states in user journals or messages |
|
|
- **Customer Service**: Analyze customer sentiment in support tickets and feedback |
|
|
- **Social Media Analytics**: Understand emotional tone of posts and comments |
|
|
- **Chatbots**: Enable emotion-aware conversational AI |
|
|
- **Content Moderation**: Flag content expressing concerning emotions |
|
|
- **UX Research**: Analyze user feedback and reviews for emotional insights |
|
|
- **Educational Tools**: Help students identify and understand emotions in text |
|
|
|
|
|
## Model Files |
|
|
|
|
|
- **best_model.keras** (111KB): Neural network classifier |
|
|
- **word2vec_optimized.model** (2.9MB): Word2Vec embeddings |
|
|
- **label_encoder.pkl** (457B): Label encoder for emotion categories |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Preprocessing Pipeline |
|
|
|
|
|
1. Lowercase conversion |
|
|
2. URL removal |
|
|
3. Mention/hashtag removal |
|
|
4. Special character removal |
|
|
5. Whitespace normalization |
|
|
|
|
|
### Inference Pipeline |
|
|
|
|
|
1. Text preprocessing |
|
|
2. Tokenization (NLTK word_tokenize) |
|
|
3. Word vector lookup |
|
|
4. Mean pooling of word vectors |
|
|
5. Neural network classification |
|
|
6. Softmax probability output |
|
|
|
|
|
### Dependencies |
|
|
|
|
|
```txt |
|
|
tensorflow>=2.13.0 |
|
|
gensim>=4.3.0 |
|
|
nltk>=3.8.0 |
|
|
numpy>=1.24.0 |
|
|
scikit-learn>=1.3.0 |
|
|
``` |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
### Responsible Use |
|
|
|
|
|
- This model should **complement, not replace** human judgment in sensitive applications |
|
|
- Emotion detection has limitations and may not always be accurate |
|
|
- Consider privacy implications when analyzing personal communications |
|
|
- Be aware of potential biases in synthetic training data |
|
|
|
|
|
### Not Recommended For |
|
|
|
|
|
- Clinical mental health diagnosis |
|
|
- Legal or law enforcement decisions |
|
|
- Employment decisions |
|
|
- Automated content removal without human review |
|
|
|
|
|
### Bias Considerations |
|
|
|
|
|
- The model was trained on synthetically generated data, which may not represent all demographic groups equally |
|
|
- Emotional expression varies across cultures, age groups, and contexts |
|
|
- The model may perform differently on various writing styles and dialects |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research or applications, please cite: |
|
|
|
|
|
```bibtex |
|
|
@model{emotion_classifier_20_2025, |
|
|
author = {Shreyas Pulle}, |
|
|
title = {20-Emotion Text Classification Model}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/shreyaspulle98/emotion-classifier-20-emotions} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Dataset |
|
|
|
|
|
The training dataset is available at: |
|
|
[shreyaspulle98/emotion-dataset-20-emotions](https://huggingface.co/datasets/shreyaspulle98/emotion-dataset-20-emotions) |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the MIT License. You are free to use, modify, and distribute this model for commercial and non-commercial purposes. |
|
|
|
|
|
## Contact |
|
|
|
|
|
- HuggingFace: [@shreyaspulle98](https://huggingface.co/shreyaspulle98) |
|
|
- Model Repository: [emotion-classifier-20-emotions](https://huggingface.co/shreyaspulle98/emotion-classifier-20-emotions) |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Training data generated using DeepInfra API |
|
|
- Built with TensorFlow/Keras and Gensim |
|
|
- Inspired by advances in emotion AI and affective computing |
|
|
|
|
|
--- |
|
|
|
|
|
**Try it out!** Test the model with your own text and explore the 20 emotions it can detect. |
|
|
|