emotion-classifier-20-emotions / README.md

Upload README.md with huggingface_hub

ab045cd verified 18 days ago

8.68 kB

	---
	license: mit
	language:
	- en
	library_name: keras
	tags:
	- emotion-classification
	- sentiment-analysis
	- text-classification
	- nlp
	- keras
	- tensorflow
	- word2vec
	pipeline_tag: text-classification
	datasets:
	- shreyaspulle98/emotion-dataset-20-emotions
	---

	# 20-Emotion Text Classification Model

	A deep learning model for fine-grained emotion classification that can detect 20 distinct emotions in text with high accuracy.

	## Model Description

	This model uses a combination of Word2Vec embeddings and a Neural Network classifier to identify emotions in text. Unlike simple sentiment analysis (positive/negative), this model can distinguish between 20 different emotional states, providing nuanced understanding of emotional content.

	### Architecture

	- Embedding Layer: Word2Vec (100-dimensional vectors)
	- Trained on 79,595 emotion-labeled sentences
	- Optimized model size: 2.9MB

	- Classifier: Feedforward Neural Network
	- Input: Sentence embeddings (mean-pooled word vectors)
	- Hidden layers with dropout for regularization
	- Output: 20-class softmax classification
	- Model size: 111KB

	### 20 Emotions Detected

	The model can classify text into these 20 emotions:

	1. Happiness
	2. Sadness
	3. Fear
	4. Anger
	5. Disgust
	6. Surprise
	7. Love
	8. Excitement
	9. Embarrassment
	10. Loneliness
	11. Anxiety
	12. Frustration
	13. Guilt
	14. Disappointment
	15. Jealousy
	16. Gratitude
	17. Pride
	18. Relief
	19. Hope
	20. Confusion

	## Usage

	### Installation

	```bash
	pip install tensorflow gensim nltk numpy scikit-learn
	```

	### Quick Start

	```python
	import numpy as np
	from tensorflow import keras
	from gensim.models import Word2Vec
	from nltk.tokenize import word_tokenize
	import pickle
	import re
	from huggingface_hub import hf_hub_download

	# Download model files
	model_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
	filename="best_model.keras")
	w2v_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
	filename="word2vec_optimized.model")
	encoder_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
	filename="label_encoder.pkl")

	# Load models
	w2v_model = Word2Vec.load(w2v_path)
	classifier = keras.models.load_model(model_path, compile=False)
	with open(encoder_path, 'rb') as f:
	label_encoder = pickle.load(f)

	# Preprocessing function
	def preprocess_text(text):
	text = str(text).lower()
	text = re.sub(r'http\S+\|www\S+\|https\S+', '', text)
	text = re.sub(r'@\w+', '', text)
	text = re.sub(r'#\w+', '', text)
	harmful_punctuation = '"#$%&()*+-/:;<=>@[\\]^_`{\|}~'
	text = text.translate(str.maketrans('', '', harmful_punctuation))
	text = re.sub(r'\s+', ' ', text).strip()
	return text

	# Sentence to vector
	def sentence_to_vector(sentence, w2v_model):
	words = word_tokenize(sentence.lower())
	word_vectors = [w2v_model.wv[word] for word in words if word in w2v_model.wv]
	if len(word_vectors) == 0:
	return np.zeros(w2v_model.wv.vector_size)
	return np.mean(word_vectors, axis=0)

	# Prediction function
	def predict_emotion(text, top_k=5):
	# Preprocess
	cleaned = preprocess_text(text)

	# Convert to vector
	vector = sentence_to_vector(cleaned, w2v_model).reshape(1, -1)

	# Predict
	probs = classifier.predict(vector, verbose=0)[0]

	# Get top-k predictions
	top_indices = np.argsort(probs)[-top_k:][::-1]

	results = []
	for idx in top_indices:
	emotion = label_encoder.inverse_transform([idx])[0]
	confidence = float(probs[idx])
	results.append({
	'emotion': emotion,
	'confidence': confidence,
	'percentage': round(confidence * 100, 1)
	})

	return results

	# Example usage
	text = "I'm so excited about this amazing opportunity!"
	predictions = predict_emotion(text)

	print(f"Text: {text}")
	print("\nTop predictions:")
	for pred in predictions:
	print(f" {pred['emotion']}: {pred['percentage']}%")
	```

	### Output Example

	```
	Text: I'm so excited about this amazing opportunity!

	Top predictions:
	excitement: 78.5%
	happiness: 12.3%
	hope: 4.2%
	gratitude: 2.8%
	pride: 2.2%
	```

	## Training Data

	This model was trained on the [emotion-dataset-20-emotions](https://huggingface.co/datasets/shreyaspulle98/emotion-dataset-20-emotions) dataset, which contains:

	- 79,595 sentences with emotion labels
	- 20 balanced emotion categories
	- Synthetically generated using advanced language models
	- Cleaned and preprocessed text

	## Performance

	The model achieves strong performance across all 20 emotion categories:

	- Training accuracy: ~95%
	- Balanced emotion distribution: Each emotion well-represented
	- Fast inference: < 100ms per prediction on CPU

	### Strengths

	- Can distinguish between subtle emotional differences (e.g., anxiety vs. fear, disappointment vs. sadness)
	- Works well with everyday conversational language
	- Lightweight and fast inference
	- No external API calls required

	### Limitations

	- English only: Currently supports only English text
	- Synthetic training data: May not capture all real-world emotional expressions
	- Single emotion: Assigns one primary emotion (though provides confidence scores for others)
	- Context-dependent: May struggle with sarcasm, irony, or culturally-specific expressions
	- Short text optimized: Best performance on sentence-level text (10-50 words)

	## Use Cases

	This model is ideal for:

	- Mental Health Apps: Detect emotional states in user journals or messages
	- Customer Service: Analyze customer sentiment in support tickets and feedback
	- Social Media Analytics: Understand emotional tone of posts and comments
	- Chatbots: Enable emotion-aware conversational AI
	- Content Moderation: Flag content expressing concerning emotions
	- UX Research: Analyze user feedback and reviews for emotional insights
	- Educational Tools: Help students identify and understand emotions in text

	## Model Files

	- best_model.keras (111KB): Neural network classifier
	- word2vec_optimized.model (2.9MB): Word2Vec embeddings
	- label_encoder.pkl (457B): Label encoder for emotion categories

	## Technical Details

	### Preprocessing Pipeline

	1. Lowercase conversion
	2. URL removal
	3. Mention/hashtag removal
	4. Special character removal
	5. Whitespace normalization

	### Inference Pipeline

	1. Text preprocessing
	2. Tokenization (NLTK word_tokenize)
	3. Word vector lookup
	4. Mean pooling of word vectors
	5. Neural network classification
	6. Softmax probability output

	### Dependencies

	```txt
	tensorflow>=2.13.0
	gensim>=4.3.0
	nltk>=3.8.0
	numpy>=1.24.0
	scikit-learn>=1.3.0
	```

	## Ethical Considerations

	### Responsible Use

	- This model should complement, not replace human judgment in sensitive applications
	- Emotion detection has limitations and may not always be accurate
	- Consider privacy implications when analyzing personal communications
	- Be aware of potential biases in synthetic training data

	### Not Recommended For

	- Clinical mental health diagnosis
	- Legal or law enforcement decisions
	- Employment decisions
	- Automated content removal without human review

	### Bias Considerations

	- The model was trained on synthetically generated data, which may not represent all demographic groups equally
	- Emotional expression varies across cultures, age groups, and contexts
	- The model may perform differently on various writing styles and dialects

	## Citation

	If you use this model in your research or applications, please cite:

	```bibtex
	@model{emotion_classifier_20_2025,
	author = {Shreyas Pulle},
	title = {20-Emotion Text Classification Model},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/shreyaspulle98/emotion-classifier-20-emotions}
	}
	```

	## Dataset

	The training dataset is available at:
	[shreyaspulle98/emotion-dataset-20-emotions](https://huggingface.co/datasets/shreyaspulle98/emotion-dataset-20-emotions)

	## License

	This model is released under the MIT License. You are free to use, modify, and distribute this model for commercial and non-commercial purposes.

	## Contact

	- HuggingFace: [@shreyaspulle98](https://huggingface.co/shreyaspulle98)
	- Model Repository: [emotion-classifier-20-emotions](https://huggingface.co/shreyaspulle98/emotion-classifier-20-emotions)

	## Acknowledgments

	- Training data generated using DeepInfra API
	- Built with TensorFlow/Keras and Gensim
	- Inspired by advances in emotion AI and affective computing

	---

	Try it out! Test the model with your own text and explore the 20 emotions it can detect.