π§ RoBERTa-Large GoEmotions (Optimized)
Multi-label Emotion Classification with Focal Loss + Per-label Threshold Optimization
π·οΈ Overview
This model fine-tunes RoBERTa-Large on the GoEmotions dataset for multi-label emotion classification, detecting 28 distinct emotions (plus neutral) in text.
Unlike standard models that use fixed thresholds or BCE loss, this version applies focal loss, per-label optimization, and targeted augmentation, leading to a balanced and generalizable model across all emotions.
Each input sentence can evoke multiple emotions simultaneously β for example:
"I can't believe this happened!" β surprise, disappointment
π Dataset
GoEmotions (Google Research, 2021)
β’ ~58k Reddit comments
β’ 28 emotion labels + neutral
β’ Multi-label: each text can have multiple active emotions
β’ Highly imbalanced: e.g., gratitude has >1,000 examples, while grief or relief have <20
This model addresses imbalance through loss design, augmentation, and threshold tuning.
π Quick Start
Installation
pip install torch transformers huggingface_hub numpy
Basic Usage (3 lines!)
import torch
import torch.nn as nn
from transformers import RobertaTokenizer, RobertaModel
from huggingface_hub import hf_hub_download
import json
import numpy as np
# Step 1: Define the model architecture
class RobertaForMultiLabelClassification(nn.Module):
def __init__(self, model_name, num_labels, dropout_rate=0.3, use_mean_pooling=True):
super().__init__()
self.roberta = RobertaModel.from_pretrained(model_name)
self.use_mean_pooling = use_mean_pooling
hidden_size = self.roberta.config.hidden_size
self.dropout1 = nn.Dropout(dropout_rate)
self.fc1 = nn.Linear(hidden_size, hidden_size // 2)
self.relu = nn.ReLU()
self.dropout2 = nn.Dropout(dropout_rate)
self.fc2 = nn.Linear(hidden_size // 2, num_labels)
def mean_pooling(self, token_embeddings, attention_mask):
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
return sum_embeddings / sum_mask
def forward(self, input_ids, attention_mask):
outputs = self.roberta(input_ids, attention_mask=attention_mask)
if self.use_mean_pooling:
pooled_output = self.mean_pooling(outputs.last_hidden_state, attention_mask)
else:
pooled_output = outputs.pooler_output
x = self.dropout1(pooled_output)
x = self.fc1(x)
x = self.relu(x)
x = self.dropout2(x)
logits = self.fc2(x)
return logits
# Step 2: Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_name = "Lakssssshya/roberta-large-goemotions"
tokenizer = RobertaTokenizer.from_pretrained(model_name)
# Load config
config_path = hf_hub_download(repo_id=model_name, filename="config.json")
with open(config_path, 'r') as f:
config = json.load(f)
model = RobertaForMultiLabelClassification(
model_name='roberta-large',
num_labels=config['num_labels'],
dropout_rate=config.get('dropout_rate', 0.3),
use_mean_pooling=config.get('use_mean_pooling', True)
)
# Load weights
weights_path = hf_hub_download(repo_id=model_name, filename="pytorch_model.bin")
state_dict = torch.load(weights_path, map_location=device)
model.load_state_dict(state_dict)
model.to(device)
model.eval()
# Load thresholds
thresholds_path = hf_hub_download(repo_id=model_name, filename="optimal_thresholds.json")
with open(thresholds_path, 'r') as f:
thresholds = np.array(json.load(f))
# Emotion labels
emotion_labels = [
'admiration', 'amusement', 'anger', 'annoyance', 'approval',
'caring', 'confusion', 'curiosity', 'desire', 'disappointment',
'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear',
'gratitude', 'grief', 'joy', 'love', 'nervousness',
'optimism', 'pride', 'realization', 'relief', 'remorse',
'sadness', 'surprise', 'neutral'
]
# Step 3: Predict
def predict_emotions(text):
inputs = tokenizer(text, return_tensors='pt', truncation=True, max_length=128, padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
logits = model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
probs = torch.sigmoid(logits).cpu().numpy()[0]
# Apply optimized thresholds
predictions = (probs > thresholds).astype(int)
# Get predicted emotions
predicted_emotions = [emotion_labels[i] for i in range(len(predictions)) if predictions[i] == 1]
# Get top emotions with scores
top_indices = np.argsort(probs)[::-1][:5]
top_emotions = [(emotion_labels[idx], float(probs[idx])) for idx in top_indices]
return {
'predicted_emotions': predicted_emotions,
'top_emotions': top_emotions
}
# Example usage
text = "I'm so proud and excited about this achievement!"
result = predict_emotions(text)
print(f"Text: {text}")
print(f"Predicted emotions: {result['predicted_emotions']}")
print(f"Top 5 emotions: {result['top_emotions']}")
Easy-to-Use Wrapper Class
For convenience, use this wrapper class:
class EmotionPredictor:
def __init__(self, model_name="Lakssssshya/roberta-large-goemotions"):
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.tokenizer = RobertaTokenizer.from_pretrained(model_name)
# Load config and model
config_path = hf_hub_download(repo_id=model_name, filename="config.json")
with open(config_path, 'r') as f:
config = json.load(f)
self.model = RobertaForMultiLabelClassification(
model_name='roberta-large',
num_labels=config['num_labels'],
dropout_rate=config.get('dropout_rate', 0.3),
use_mean_pooling=config.get('use_mean_pooling', True)
)
weights_path = hf_hub_download(repo_id=model_name, filename="pytorch_model.bin")
state_dict = torch.load(weights_path, map_location=self.device)
self.model.load_state_dict(state_dict)
self.model.to(self.device)
self.model.eval()
# Load thresholds
thresholds_path = hf_hub_download(repo_id=model_name, filename="optimal_thresholds.json")
with open(thresholds_path, 'r') as f:
self.thresholds = np.array(json.load(f))
self.emotion_labels = [
'admiration', 'amusement', 'anger', 'annoyance', 'approval',
'caring', 'confusion', 'curiosity', 'desire', 'disappointment',
'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear',
'gratitude', 'grief', 'joy', 'love', 'nervousness',
'optimism', 'pride', 'realization', 'relief', 'remorse',
'sadness', 'surprise', 'neutral'
]
def predict(self, text, top_k=5):
inputs = self.tokenizer(text, return_tensors='pt', truncation=True, max_length=128, padding=True)
inputs = {k: v.to(self.device) for k, v in inputs.items()}
with torch.no_grad():
logits = self.model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
probs = torch.sigmoid(logits).cpu().numpy()[0]
predictions = (probs > self.thresholds).astype(int)
predicted_emotions = [self.emotion_labels[i] for i in range(len(predictions)) if predictions[i] == 1]
top_indices = np.argsort(probs)[::-1][:top_k]
top_emotions = [
{'emotion': self.emotion_labels[idx], 'score': float(probs[idx])}
for idx in top_indices
]
return {'text': text, 'emotions': predicted_emotions, 'top_emotions': top_emotions}
# Simple usage
predictor = EmotionPredictor()
result = predictor.predict("I'm so happy and excited!")
print(result)
π Example Predictions
Example 1: Pride and Achievement
text = "I'm so proud and excited about this achievement!"
result = predictor.predict(text)
# Output: {'emotions': ['pride', 'excitement', 'joy'], 'top_emotions': [{'emotion': 'pride', 'score': 0.867}, {'emotion': 'excitement', 'score': 0.712}, ...]}
Example 2: Regret
text = "I really regret saying that earlier."
result = predictor.predict(text)
# Output: {'emotions': ['remorse', 'sadness'], 'top_emotions': [{'emotion': 'remorse', 'score': 0.758}, ...]}
Example 3: Mixed Emotions
text = "I feel anxious but hopeful about the future."
result = predictor.predict(text)
# Output: {'emotions': ['nervousness', 'optimism'], 'top_emotions': [{'emotion': 'nervousness', 'score': 0.710}, {'emotion': 'optimism', 'score': 0.645}, ...]}
Example 4: Surprise
text = "I can't believe this happened!"
result = predictor.predict(text)
# Output: {'emotions': ['surprise', 'realization'], 'top_emotions': [{'emotion': 'surprise', 'score': 0.747}, ...]}
Example 5: Gratitude
text = "Thank you so much for all your help and support!"
result = predictor.predict(text)
# Output: {'emotions': ['gratitude', 'admiration'], 'top_emotions': [{'emotion': 'gratitude', 'score': 0.923}, ...]}
Example 6: Multiple Strong Emotions
text = "This is absolutely disgusting and infuriating!"
result = predictor.predict(text)
# Output: {'emotions': ['disgust', 'anger', 'annoyance'], 'top_emotions': [{'emotion': 'disgust', 'score': 0.834}, {'emotion': 'anger', 'score': 0.789}, ...]}
βοΈ Training Details
| Parameter | Value |
|---|---|
| Base model | roberta-large |
| Task | Multi-label classification |
| Epochs | 5 (first 2 frozen encoder) |
| Learning rate | 2.6e-5 (Optuna-tuned) |
| Optimizer | AdamW |
| Weight decay | 0.01 |
| Warmup ratio | 0.1 |
| Loss | Focal Loss (Ξ±=0.38, Ξ³=2.8) |
| Gradient accumulation | 16 |
| Scheduler | Linear decay |
| Mixed precision | β FP16 |
| Pooling | Mean pooling |
| Dropout | 0.41 |
| Batch size | 2 Γ 16 (accumulated) |
| Early stopping | Patience = 3 |
| Threshold range | [0.05, 0.95] optimized per label |
Thresholds are optimized individually to maximize per-label F1 and saved in optimal_thresholds.json.
π Evaluation Metrics (Test Split)
| Metric Type | Precision | Recall | F1 |
|---|---|---|---|
| Macro (unweighted) | 0.497 | 0.576 | 0.519 |
| Weighted by label frequency | 0.505 | 0.585 | 0.528 |
This model achieves a balanced macro-F1 above 0.52, maintaining strong recall even for underrepresented emotions.
π Per-Label Performance (Best performance on Val Split)
| label | accuracy | precision | recall | f1 | mcc | support | threshold |
|---|---|---|---|---|---|---|---|
| AVG | 0.962 | 0.541 | 0.592 | 0.551 | 0.538 | 6380 | 0.645 |
| admiration | 0.95 | 0.718 | 0.732 | 0.725 | 0.697 | 488 | 0.616 |
| amusement | 0.975 | 0.727 | 0.888 | 0.799 | 0.791 | 303 | 0.652 |
| anger | 0.96 | 0.453 | 0.544 | 0.494 | 0.476 | 195 | 0.648 |
| annoyance | 0.911 | 0.326 | 0.558 | 0.412 | 0.383 | 303 | 0.568 |
| approval | 0.885 | 0.307 | 0.456 | 0.367 | 0.314 | 397 | 0.543 |
| caring | 0.971 | 0.481 | 0.582 | 0.527 | 0.514 | 153 | 0.618 |
| confusion | 0.973 | 0.521 | 0.401 | 0.454 | 0.444 | 152 | 0.653 |
| curiosity | 0.947 | 0.452 | 0.71 | 0.553 | 0.541 | 248 | 0.601 |
| desire | 0.989 | 0.642 | 0.558 | 0.597 | 0.593 | 77 | 0.729 |
| disappointment | 0.96 | 0.337 | 0.344 | 0.34 | 0.32 | 163 | 0.605 |
| disapproval | 0.908 | 0.329 | 0.678 | 0.443 | 0.431 | 292 | 0.545 |
| disgust | 0.984 | 0.549 | 0.464 | 0.503 | 0.496 | 97 | 0.697 |
| embarrassment | 0.992 | 0.417 | 0.714 | 0.526 | 0.542 | 35 | 0.674 |
| excitement | 0.977 | 0.357 | 0.365 | 0.361 | 0.349 | 96 | 0.676 |
| fear | 0.991 | 0.724 | 0.7 | 0.712 | 0.707 | 90 | 0.681 |
| gratitude | 0.989 | 0.96 | 0.877 | 0.917 | 0.912 | 358 | 0.685 |
| grief | 0.998 | 0.545 | 0.462 | 0.5 | 0.501 | 13 | 0.774 |
| joy | 0.974 | 0.594 | 0.587 | 0.591 | 0.577 | 172 | 0.642 |
| love | 0.979 | 0.718 | 0.889 | 0.794 | 0.788 | 252 | 0.648 |
| nervousness | 0.995 | 0.391 | 0.429 | 0.409 | 0.407 | 21 | 0.7 |
| optimism | 0.973 | 0.69 | 0.555 | 0.615 | 0.606 | 209 | 0.669 |
| pride | 0.999 | 0.889 | 0.533 | 0.667 | 0.688 | 15 | 0.694 |
| realization | 0.976 | 0.475 | 0.228 | 0.309 | 0.319 | 127 | 0.658 |
| relief | 0.992 | 0.217 | 0.556 | 0.312 | 0.344 | 18 | 0.594 |
| remorse | 0.992 | 0.648 | 0.838 | 0.731 | 0.733 | 68 | 0.728 |
| sadness | 0.976 | 0.543 | 0.573 | 0.558 | 0.546 | 143 | 0.65 |
| surprise | 0.978 | 0.528 | 0.589 | 0.557 | 0.546 | 129 | 0.664 |
| neutral | 0.753 | 0.593 | 0.767 | 0.669 | 0.487 | 1766 | 0.446 |
π Per-Label Performance (Test Split)
| Label | Accuracy | Precision | Recall | F1 | MCC | Support | Threshold |
|---|---|---|---|---|---|---|---|
| AVG | 0.961 | 0.497 | 0.576 | 0.519 | 0.506 | 6329 | 0.645 |
| admiration | 0.940 | 0.676 | 0.690 | 0.683 | 0.650 | 504 | 0.616 |
| amusement | 0.980 | 0.740 | 0.905 | 0.814 | 0.808 | 264 | 0.652 |
| anger | 0.959 | 0.455 | 0.556 | 0.500 | 0.482 | 198 | 0.648 |
| annoyance | 0.905 | 0.315 | 0.519 | 0.392 | 0.356 | 320 | 0.568 |
| approval | 0.900 | 0.331 | 0.538 | 0.410 | 0.371 | 351 | 0.543 |
| caring | 0.967 | 0.374 | 0.496 | 0.427 | 0.414 | 135 | 0.618 |
| confusion | 0.971 | 0.487 | 0.477 | 0.482 | 0.467 | 153 | 0.653 |
| curiosity | 0.945 | 0.483 | 0.715 | 0.577 | 0.561 | 284 | 0.601 |
| desire | 0.987 | 0.647 | 0.398 | 0.493 | 0.501 | 83 | 0.729 |
| disappointment | 0.962 | 0.313 | 0.311 | 0.312 | 0.293 | 151 | 0.605 |
| disapproval | 0.899 | 0.284 | 0.697 | 0.403 | 0.402 | 267 | 0.545 |
| disgust | 0.978 | 0.531 | 0.423 | 0.471 | 0.463 | 123 | 0.697 |
| embarrassment | 0.989 | 0.288 | 0.459 | 0.354 | 0.358 | 37 | 0.674 |
| excitement | 0.977 | 0.400 | 0.447 | 0.422 | 0.411 | 103 | 0.676 |
| fear | 0.988 | 0.564 | 0.795 | 0.660 | 0.664 | 78 | 0.681 |
| gratitude | 0.989 | 0.939 | 0.881 | 0.909 | 0.904 | 352 | 0.685 |
| grief | 0.999 | 0.429 | 0.500 | 0.462 | 0.462 | 6 | 0.774 |
| joy | 0.974 | 0.557 | 0.640 | 0.595 | 0.584 | 161 | 0.642 |
| love | 0.979 | 0.731 | 0.832 | 0.778 | 0.769 | 238 | 0.648 |
| nervousness | 0.994 | 0.333 | 0.348 | 0.340 | 0.338 | 23 | 0.700 |
| optimism | 0.975 | 0.686 | 0.516 | 0.589 | 0.583 | 186 | 0.669 |
| pride | 0.997 | 0.538 | 0.438 | 0.483 | 0.484 | 16 | 0.694 |
| realization | 0.970 | 0.380 | 0.207 | 0.268 | 0.266 | 145 | 0.658 |
| relief | 0.993 | 0.171 | 0.636 | 0.269 | 0.327 | 11 | 0.594 |
| remorse | 0.991 | 0.540 | 0.839 | 0.657 | 0.669 | 56 | 0.728 |
| sadness | 0.975 | 0.580 | 0.532 | 0.555 | 0.543 | 156 | 0.650 |
| surprise | 0.977 | 0.553 | 0.553 | 0.553 | 0.541 | 141 | 0.664 |
| neutral | 0.753 | 0.595 | 0.776 | 0.674 | 0.491 | 1787 | 0.446 |
π₯ Why This Model Outperforms Other GoEmotions Models
Most GoEmotions models on Hugging Face use BCE loss and a fixed 0.5 threshold.
While effective for frequent emotions, they perform poorly on rare ones.
This model overcomes those limits via:
1οΈβ£ Adaptive per-label thresholds β Each label has a unique optimized decision boundary, maximizing per-label F1 and balancing recall/precision.
2οΈβ£ Focal Loss β Down-weights easy examples and boosts hard ones, enhancing minority class generalization.
3οΈβ£ Mean Pooling β Captures richer emotional nuance than CLS-based pooling.
4οΈβ£ Targeted Augmentation β Paraphrasing and synonym replacement to strengthen rare emotion classes.
5οΈβ£ Gradual Unfreezing β Stabilizes fine-tuning by freezing encoder early epochs.
6οΈβ£ Balanced Macro Metrics β Consistent fairness across all emotion classes.
π§Ύ Model Details
| Detail | Description |
|---|---|
| Architecture | RoBERTa-Large + Mean Pooling + 2-layer MLP |
| Task Type | Multi-label emotion classification |
| Input Length | Up to 128 tokens |
| Output | 28 sigmoid probabilities |
| Framework | PyTorch / Hugging Face Transformers |
| Mixed Precision | Supported (FP16) |
| License | MIT |
| Developer | Lakshya Kumar |
π― Intended Use
- Emotion detection in conversational AI systems
- Social media sentiment analysis
- Affective computing research
- Psychological and HCI studies
β οΈ Limitations
- Trained on Reddit; may not generalize to formal or non-English domains.
- Rare emotions (e.g. grief, relief) have low support.
- Detects linguistic cues, not true emotional state.
π Citation
@misc{lakshya2025robertalargegoemotions,
title={RoBERTa-Large GoEmotions (Optimized Thresholds and Focal Loss)},
author={Lakshya Kumar},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/Lakssssshya/roberta-large-goemotions}}
}
π Acknowledgments
- Google Research for the GoEmotions dataset
- Hugging Face for the transformers library
- Meta AI for RoBERTa architecture
π§ Contact
For questions or collaborations:
- Hugging Face: @Lakssssshya
- Model Repository: roberta-large-goemotions
β If you find this model useful, please give it a star and share it with others!
- Downloads last month
- 92
Model tree for Lakssssshya/roberta-large-goemotions
Base model
FacebookAI/roberta-large