🧠 RoBERTa-Large GoEmotions (Optimized)

Multi-label Emotion Classification with Focal Loss + Per-label Threshold Optimization

🏷️ Overview

This model fine-tunes RoBERTa-Large on the GoEmotions dataset for multi-label emotion classification, detecting 28 distinct emotions (plus neutral) in text.
Unlike standard models that use fixed thresholds or BCE loss, this version applies focal loss, per-label optimization, and targeted augmentation, leading to a balanced and generalizable model across all emotions.

Each input sentence can evoke multiple emotions simultaneously β€” for example:

"I can't believe this happened!" β†’ surprise, disappointment


πŸ“š Dataset

GoEmotions (Google Research, 2021)
β€’ ~58k Reddit comments
β€’ 28 emotion labels + neutral
β€’ Multi-label: each text can have multiple active emotions
β€’ Highly imbalanced: e.g., gratitude has >1,000 examples, while grief or relief have <20

This model addresses imbalance through loss design, augmentation, and threshold tuning.


πŸš€ Quick Start

Installation

pip install torch transformers huggingface_hub numpy

Basic Usage (3 lines!)

import torch
import torch.nn as nn
from transformers import RobertaTokenizer, RobertaModel
from huggingface_hub import hf_hub_download
import json
import numpy as np
# Step 1: Define the model architecture
class RobertaForMultiLabelClassification(nn.Module):
    def __init__(self, model_name, num_labels, dropout_rate=0.3, use_mean_pooling=True):
        super().__init__()
        self.roberta = RobertaModel.from_pretrained(model_name)
        self.use_mean_pooling = use_mean_pooling
        hidden_size = self.roberta.config.hidden_size
        self.dropout1 = nn.Dropout(dropout_rate)
        self.fc1 = nn.Linear(hidden_size, hidden_size // 2)
        self.relu = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout_rate)
        self.fc2 = nn.Linear(hidden_size // 2, num_labels)
    def mean_pooling(self, token_embeddings, attention_mask):
        input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
        sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
        return sum_embeddings / sum_mask
    def forward(self, input_ids, attention_mask):
        outputs = self.roberta(input_ids, attention_mask=attention_mask)
        if self.use_mean_pooling:
            pooled_output = self.mean_pooling(outputs.last_hidden_state, attention_mask)
        else:
            pooled_output = outputs.pooler_output
        x = self.dropout1(pooled_output)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout2(x)
        logits = self.fc2(x)
        return logits
# Step 2: Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_name = "Lakssssshya/roberta-large-goemotions"
tokenizer = RobertaTokenizer.from_pretrained(model_name)
# Load config
config_path = hf_hub_download(repo_id=model_name, filename="config.json")
with open(config_path, 'r') as f:
    config = json.load(f)
model = RobertaForMultiLabelClassification(
    model_name='roberta-large',
    num_labels=config['num_labels'],
    dropout_rate=config.get('dropout_rate', 0.3),
    use_mean_pooling=config.get('use_mean_pooling', True)
)
# Load weights
weights_path = hf_hub_download(repo_id=model_name, filename="pytorch_model.bin")
state_dict = torch.load(weights_path, map_location=device)
model.load_state_dict(state_dict)
model.to(device)
model.eval()
# Load thresholds
thresholds_path = hf_hub_download(repo_id=model_name, filename="optimal_thresholds.json")
with open(thresholds_path, 'r') as f:
    thresholds = np.array(json.load(f))
# Emotion labels
emotion_labels = [
    'admiration', 'amusement', 'anger', 'annoyance', 'approval', 
    'caring', 'confusion', 'curiosity', 'desire', 'disappointment',
    'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear',
    'gratitude', 'grief', 'joy', 'love', 'nervousness',
    'optimism', 'pride', 'realization', 'relief', 'remorse',
    'sadness', 'surprise', 'neutral'
]
# Step 3: Predict
def predict_emotions(text):
    inputs = tokenizer(text, return_tensors='pt', truncation=True, max_length=128, padding=True)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        logits = model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
        probs = torch.sigmoid(logits).cpu().numpy()[0]
    
    # Apply optimized thresholds
    predictions = (probs > thresholds).astype(int)
    
    # Get predicted emotions
    predicted_emotions = [emotion_labels[i] for i in range(len(predictions)) if predictions[i] == 1]
    
    # Get top emotions with scores
    top_indices = np.argsort(probs)[::-1][:5]
    top_emotions = [(emotion_labels[idx], float(probs[idx])) for idx in top_indices]
    
    return {
        'predicted_emotions': predicted_emotions,
        'top_emotions': top_emotions
    }
# Example usage
text = "I'm so proud and excited about this achievement!"
result = predict_emotions(text)
print(f"Text: {text}")
print(f"Predicted emotions: {result['predicted_emotions']}")
print(f"Top 5 emotions: {result['top_emotions']}")

Easy-to-Use Wrapper Class

For convenience, use this wrapper class:

class EmotionPredictor:
    def __init__(self, model_name="Lakssssshya/roberta-large-goemotions"):
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.tokenizer = RobertaTokenizer.from_pretrained(model_name)
        
        # Load config and model
        config_path = hf_hub_download(repo_id=model_name, filename="config.json")
        with open(config_path, 'r') as f:
            config = json.load(f)
        
        self.model = RobertaForMultiLabelClassification(
            model_name='roberta-large',
            num_labels=config['num_labels'],
            dropout_rate=config.get('dropout_rate', 0.3),
            use_mean_pooling=config.get('use_mean_pooling', True)
        )
        
        weights_path = hf_hub_download(repo_id=model_name, filename="pytorch_model.bin")
        state_dict = torch.load(weights_path, map_location=self.device)
        self.model.load_state_dict(state_dict)
        self.model.to(self.device)
        self.model.eval()
        
        # Load thresholds
        thresholds_path = hf_hub_download(repo_id=model_name, filename="optimal_thresholds.json")
        with open(thresholds_path, 'r') as f:
            self.thresholds = np.array(json.load(f))
        
        self.emotion_labels = [
            'admiration', 'amusement', 'anger', 'annoyance', 'approval', 
            'caring', 'confusion', 'curiosity', 'desire', 'disappointment',
            'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear',
            'gratitude', 'grief', 'joy', 'love', 'nervousness',
            'optimism', 'pride', 'realization', 'relief', 'remorse',
            'sadness', 'surprise', 'neutral'
        ]
    
    def predict(self, text, top_k=5):
        inputs = self.tokenizer(text, return_tensors='pt', truncation=True, max_length=128, padding=True)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            logits = self.model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
            probs = torch.sigmoid(logits).cpu().numpy()[0]
        
        predictions = (probs > self.thresholds).astype(int)
        predicted_emotions = [self.emotion_labels[i] for i in range(len(predictions)) if predictions[i] == 1]
        
        top_indices = np.argsort(probs)[::-1][:top_k]
        top_emotions = [
            {'emotion': self.emotion_labels[idx], 'score': float(probs[idx])}
            for idx in top_indices
        ]
        
        return {'text': text, 'emotions': predicted_emotions, 'top_emotions': top_emotions}
# Simple usage
predictor = EmotionPredictor()
result = predictor.predict("I'm so happy and excited!")
print(result)

πŸ“ Example Predictions

Example 1: Pride and Achievement

text = "I'm so proud and excited about this achievement!"
result = predictor.predict(text)
# Output: {'emotions': ['pride', 'excitement', 'joy'], 'top_emotions': [{'emotion': 'pride', 'score': 0.867}, {'emotion': 'excitement', 'score': 0.712}, ...]}

Example 2: Regret

text = "I really regret saying that earlier."
result = predictor.predict(text)
# Output: {'emotions': ['remorse', 'sadness'], 'top_emotions': [{'emotion': 'remorse', 'score': 0.758}, ...]}

Example 3: Mixed Emotions

text = "I feel anxious but hopeful about the future."
result = predictor.predict(text)
# Output: {'emotions': ['nervousness', 'optimism'], 'top_emotions': [{'emotion': 'nervousness', 'score': 0.710}, {'emotion': 'optimism', 'score': 0.645}, ...]}

Example 4: Surprise

text = "I can't believe this happened!"
result = predictor.predict(text)
# Output: {'emotions': ['surprise', 'realization'], 'top_emotions': [{'emotion': 'surprise', 'score': 0.747}, ...]}

Example 5: Gratitude

text = "Thank you so much for all your help and support!"
result = predictor.predict(text)
# Output: {'emotions': ['gratitude', 'admiration'], 'top_emotions': [{'emotion': 'gratitude', 'score': 0.923}, ...]}

Example 6: Multiple Strong Emotions

text = "This is absolutely disgusting and infuriating!"
result = predictor.predict(text)
# Output: {'emotions': ['disgust', 'anger', 'annoyance'], 'top_emotions': [{'emotion': 'disgust', 'score': 0.834}, {'emotion': 'anger', 'score': 0.789}, ...]}

βš™οΈ Training Details

Parameter Value
Base model roberta-large
Task Multi-label classification
Epochs 5 (first 2 frozen encoder)
Learning rate 2.6e-5 (Optuna-tuned)
Optimizer AdamW
Weight decay 0.01
Warmup ratio 0.1
Loss Focal Loss (Ξ±=0.38, Ξ³=2.8)
Gradient accumulation 16
Scheduler Linear decay
Mixed precision βœ… FP16
Pooling Mean pooling
Dropout 0.41
Batch size 2 Γ— 16 (accumulated)
Early stopping Patience = 3
Threshold range [0.05, 0.95] optimized per label

Thresholds are optimized individually to maximize per-label F1 and saved in optimal_thresholds.json.


πŸ“Š Evaluation Metrics (Test Split)

Metric Type Precision Recall F1
Macro (unweighted) 0.497 0.576 0.519
Weighted by label frequency 0.505 0.585 0.528

This model achieves a balanced macro-F1 above 0.52, maintaining strong recall even for underrepresented emotions.


πŸ” Per-Label Performance (Best performance on Val Split)

label accuracy precision recall f1 mcc support threshold
AVG 0.962 0.541 0.592 0.551 0.538 6380 0.645
admiration 0.95 0.718 0.732 0.725 0.697 488 0.616
amusement 0.975 0.727 0.888 0.799 0.791 303 0.652
anger 0.96 0.453 0.544 0.494 0.476 195 0.648
annoyance 0.911 0.326 0.558 0.412 0.383 303 0.568
approval 0.885 0.307 0.456 0.367 0.314 397 0.543
caring 0.971 0.481 0.582 0.527 0.514 153 0.618
confusion 0.973 0.521 0.401 0.454 0.444 152 0.653
curiosity 0.947 0.452 0.71 0.553 0.541 248 0.601
desire 0.989 0.642 0.558 0.597 0.593 77 0.729
disappointment 0.96 0.337 0.344 0.34 0.32 163 0.605
disapproval 0.908 0.329 0.678 0.443 0.431 292 0.545
disgust 0.984 0.549 0.464 0.503 0.496 97 0.697
embarrassment 0.992 0.417 0.714 0.526 0.542 35 0.674
excitement 0.977 0.357 0.365 0.361 0.349 96 0.676
fear 0.991 0.724 0.7 0.712 0.707 90 0.681
gratitude 0.989 0.96 0.877 0.917 0.912 358 0.685
grief 0.998 0.545 0.462 0.5 0.501 13 0.774
joy 0.974 0.594 0.587 0.591 0.577 172 0.642
love 0.979 0.718 0.889 0.794 0.788 252 0.648
nervousness 0.995 0.391 0.429 0.409 0.407 21 0.7
optimism 0.973 0.69 0.555 0.615 0.606 209 0.669
pride 0.999 0.889 0.533 0.667 0.688 15 0.694
realization 0.976 0.475 0.228 0.309 0.319 127 0.658
relief 0.992 0.217 0.556 0.312 0.344 18 0.594
remorse 0.992 0.648 0.838 0.731 0.733 68 0.728
sadness 0.976 0.543 0.573 0.558 0.546 143 0.65
surprise 0.978 0.528 0.589 0.557 0.546 129 0.664
neutral 0.753 0.593 0.767 0.669 0.487 1766 0.446

πŸ” Per-Label Performance (Test Split)

Label Accuracy Precision Recall F1 MCC Support Threshold
AVG 0.961 0.497 0.576 0.519 0.506 6329 0.645
admiration 0.940 0.676 0.690 0.683 0.650 504 0.616
amusement 0.980 0.740 0.905 0.814 0.808 264 0.652
anger 0.959 0.455 0.556 0.500 0.482 198 0.648
annoyance 0.905 0.315 0.519 0.392 0.356 320 0.568
approval 0.900 0.331 0.538 0.410 0.371 351 0.543
caring 0.967 0.374 0.496 0.427 0.414 135 0.618
confusion 0.971 0.487 0.477 0.482 0.467 153 0.653
curiosity 0.945 0.483 0.715 0.577 0.561 284 0.601
desire 0.987 0.647 0.398 0.493 0.501 83 0.729
disappointment 0.962 0.313 0.311 0.312 0.293 151 0.605
disapproval 0.899 0.284 0.697 0.403 0.402 267 0.545
disgust 0.978 0.531 0.423 0.471 0.463 123 0.697
embarrassment 0.989 0.288 0.459 0.354 0.358 37 0.674
excitement 0.977 0.400 0.447 0.422 0.411 103 0.676
fear 0.988 0.564 0.795 0.660 0.664 78 0.681
gratitude 0.989 0.939 0.881 0.909 0.904 352 0.685
grief 0.999 0.429 0.500 0.462 0.462 6 0.774
joy 0.974 0.557 0.640 0.595 0.584 161 0.642
love 0.979 0.731 0.832 0.778 0.769 238 0.648
nervousness 0.994 0.333 0.348 0.340 0.338 23 0.700
optimism 0.975 0.686 0.516 0.589 0.583 186 0.669
pride 0.997 0.538 0.438 0.483 0.484 16 0.694
realization 0.970 0.380 0.207 0.268 0.266 145 0.658
relief 0.993 0.171 0.636 0.269 0.327 11 0.594
remorse 0.991 0.540 0.839 0.657 0.669 56 0.728
sadness 0.975 0.580 0.532 0.555 0.543 156 0.650
surprise 0.977 0.553 0.553 0.553 0.541 141 0.664
neutral 0.753 0.595 0.776 0.674 0.491 1787 0.446

πŸ₯‡ Why This Model Outperforms Other GoEmotions Models

Most GoEmotions models on Hugging Face use BCE loss and a fixed 0.5 threshold.
While effective for frequent emotions, they perform poorly on rare ones.

This model overcomes those limits via:

1️⃣ Adaptive per-label thresholds β€” Each label has a unique optimized decision boundary, maximizing per-label F1 and balancing recall/precision.
2️⃣ Focal Loss β€” Down-weights easy examples and boosts hard ones, enhancing minority class generalization.
3️⃣ Mean Pooling β€” Captures richer emotional nuance than CLS-based pooling.
4️⃣ Targeted Augmentation β€” Paraphrasing and synonym replacement to strengthen rare emotion classes.
5️⃣ Gradual Unfreezing β€” Stabilizes fine-tuning by freezing encoder early epochs.
6️⃣ Balanced Macro Metrics β€” Consistent fairness across all emotion classes.


🧾 Model Details

Detail Description
Architecture RoBERTa-Large + Mean Pooling + 2-layer MLP
Task Type Multi-label emotion classification
Input Length Up to 128 tokens
Output 28 sigmoid probabilities
Framework PyTorch / Hugging Face Transformers
Mixed Precision Supported (FP16)
License MIT
Developer Lakshya Kumar

🎯 Intended Use

  • Emotion detection in conversational AI systems
  • Social media sentiment analysis
  • Affective computing research
  • Psychological and HCI studies

⚠️ Limitations

  • Trained on Reddit; may not generalize to formal or non-English domains.
  • Rare emotions (e.g. grief, relief) have low support.
  • Detects linguistic cues, not true emotional state.

πŸ“Ž Citation

@misc{lakshya2025robertalargegoemotions,
  title={RoBERTa-Large GoEmotions (Optimized Thresholds and Focal Loss)},
  author={Lakshya Kumar},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Lakssssshya/roberta-large-goemotions}}
}

πŸ™ Acknowledgments

  • Google Research for the GoEmotions dataset
  • Hugging Face for the transformers library
  • Meta AI for RoBERTa architecture

πŸ“§ Contact

For questions or collaborations:


⭐ If you find this model useful, please give it a star and share it with others!

Downloads last month
92
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Lakssssshya/roberta-large-goemotions

Finetuned
(413)
this model

Dataset used to train Lakssssshya/roberta-large-goemotions

Space using Lakssssshya/roberta-large-goemotions 1