🧠 RoBERTa-Large GoEmotions (Optimized)

Multi-label Emotion Classification with Focal Loss + Per-label Threshold Optimization

🏷️ Overview

This model fine-tunes RoBERTa-Large on the GoEmotions dataset for multi-label emotion classification, detecting 28 distinct emotions (plus neutral) in text.
Unlike standard models that use fixed thresholds or BCE loss, this version applies focal loss, per-label optimization, and targeted augmentation, leading to a balanced and generalizable model across all emotions.

Each input sentence can evoke multiple emotions simultaneously — for example:

"I can't believe this happened!" → surprise, disappointment

📚 Dataset

GoEmotions (Google Research, 2021)
• ~58k Reddit comments
• 28 emotion labels + neutral
• Multi-label: each text can have multiple active emotions
• Highly imbalanced: e.g., gratitude has >1,000 examples, while grief or relief have <20

This model addresses imbalance through loss design, augmentation, and threshold tuning.

🚀 Quick Start

Installation

pip install torch transformers huggingface_hub numpy

Basic Usage (3 lines!)

import torch
import torch.nn as nn
from transformers import RobertaTokenizer, RobertaModel
from huggingface_hub import hf_hub_download
import json
import numpy as np
# Step 1: Define the model architecture
class RobertaForMultiLabelClassification(nn.Module):
    def __init__(self, model_name, num_labels, dropout_rate=0.3, use_mean_pooling=True):
        super().__init__()
        self.roberta = RobertaModel.from_pretrained(model_name)
        self.use_mean_pooling = use_mean_pooling
        hidden_size = self.roberta.config.hidden_size
        self.dropout1 = nn.Dropout(dropout_rate)
        self.fc1 = nn.Linear(hidden_size, hidden_size // 2)
        self.relu = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout_rate)
        self.fc2 = nn.Linear(hidden_size // 2, num_labels)
    def mean_pooling(self, token_embeddings, attention_mask):
        input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
        sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
        return sum_embeddings / sum_mask
    def forward(self, input_ids, attention_mask):
        outputs = self.roberta(input_ids, attention_mask=attention_mask)
        if self.use_mean_pooling:
            pooled_output = self.mean_pooling(outputs.last_hidden_state, attention_mask)
        else:
            pooled_output = outputs.pooler_output
        x = self.dropout1(pooled_output)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout2(x)
        logits = self.fc2(x)
        return logits
# Step 2: Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_name = "Lakssssshya/roberta-large-goemotions"
tokenizer = RobertaTokenizer.from_pretrained(model_name)
# Load config
config_path = hf_hub_download(repo_id=model_name, filename="config.json")
with open(config_path, 'r') as f:
    config = json.load(f)
model = RobertaForMultiLabelClassification(
    model_name='roberta-large',
    num_labels=config['num_labels'],
    dropout_rate=config.get('dropout_rate', 0.3),
    use_mean_pooling=config.get('use_mean_pooling', True)
)
# Load weights
weights_path = hf_hub_download(repo_id=model_name, filename="pytorch_model.bin")
state_dict = torch.load(weights_path, map_location=device)
model.load_state_dict(state_dict)
model.to(device)
model.eval()
# Load thresholds
thresholds_path = hf_hub_download(repo_id=model_name, filename="optimal_thresholds.json")
with open(thresholds_path, 'r') as f:
    thresholds = np.array(json.load(f))
# Emotion labels
emotion_labels = [
    'admiration', 'amusement', 'anger', 'annoyance', 'approval', 
    'caring', 'confusion', 'curiosity', 'desire', 'disappointment',
    'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear',
    'gratitude', 'grief', 'joy', 'love', 'nervousness',
    'optimism', 'pride', 'realization', 'relief', 'remorse',
    'sadness', 'surprise', 'neutral'
]
# Step 3: Predict
def predict_emotions(text):
    inputs = tokenizer(text, return_tensors='pt', truncation=True, max_length=128, padding=True)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        logits = model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
        probs = torch.sigmoid(logits).cpu().numpy()[0]
    
    # Apply optimized thresholds
    predictions = (probs > thresholds).astype(int)
    
    # Get predicted emotions
    predicted_emotions = [emotion_labels[i] for i in range(len(predictions)) if predictions[i] == 1]
    
    # Get top emotions with scores
    top_indices = np.argsort(probs)[::-1][:5]
    top_emotions = [(emotion_labels[idx], float(probs[idx])) for idx in top_indices]
    
    return {
        'predicted_emotions': predicted_emotions,
        'top_emotions': top_emotions
    }
# Example usage
text = "I'm so proud and excited about this achievement!"
result = predict_emotions(text)
print(f"Text: {text}")
print(f"Predicted emotions: {result['predicted_emotions']}")
print(f"Top 5 emotions: {result['top_emotions']}")

Easy-to-Use Wrapper Class

For convenience, use this wrapper class:

class EmotionPredictor:
    def __init__(self, model_name="Lakssssshya/roberta-large-goemotions"):
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.tokenizer = RobertaTokenizer.from_pretrained(model_name)
        
        # Load config and model
        config_path = hf_hub_download(repo_id=model_name, filename="config.json")
        with open(config_path, 'r') as f:
            config = json.load(f)
        
        self.model = RobertaForMultiLabelClassification(
            model_name='roberta-large',
            num_labels=config['num_labels'],
            dropout_rate=config.get('dropout_rate', 0.3),
            use_mean_pooling=config.get('use_mean_pooling', True)
        )
        
        weights_path = hf_hub_download(repo_id=model_name, filename="pytorch_model.bin")
        state_dict = torch.load(weights_path, map_location=self.device)
        self.model.load_state_dict(state_dict)
        self.model.to(self.device)
        self.model.eval()
        
        # Load thresholds
        thresholds_path = hf_hub_download(repo_id=model_name, filename="optimal_thresholds.json")
        with open(thresholds_path, 'r') as f:
            self.thresholds = np.array(json.load(f))
        
        self.emotion_labels = [
            'admiration', 'amusement', 'anger', 'annoyance', 'approval', 
            'caring', 'confusion', 'curiosity', 'desire', 'disappointment',
            'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear',
            'gratitude', 'grief', 'joy', 'love', 'nervousness',
            'optimism', 'pride', 'realization', 'relief', 'remorse',
            'sadness', 'surprise', 'neutral'
        ]
    
    def predict(self, text, top_k=5):
        inputs = self.tokenizer(text, return_tensors='pt', truncation=True, max_length=128, padding=True)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            logits = self.model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
            probs = torch.sigmoid(logits).cpu().numpy()[0]
        
        predictions = (probs > self.thresholds).astype(int)
        predicted_emotions = [self.emotion_labels[i] for i in range(len(predictions)) if predictions[i] == 1]
        
        top_indices = np.argsort(probs)[::-1][:top_k]
        top_emotions = [
            {'emotion': self.emotion_labels[idx], 'score': float(probs[idx])}
            for idx in top_indices
        ]
        
        return {'text': text, 'emotions': predicted_emotions, 'top_emotions': top_emotions}
# Simple usage
predictor = EmotionPredictor()
result = predictor.predict("I'm so happy and excited!")
print(result)

📝 Example Predictions

Example 1: Pride and Achievement

text = "I'm so proud and excited about this achievement!"
result = predictor.predict(text)
# Output: {'emotions': ['pride', 'excitement', 'joy'], 'top_emotions': [{'emotion': 'pride', 'score': 0.867}, {'emotion': 'excitement', 'score': 0.712}, ...]}

Example 2: Regret

text = "I really regret saying that earlier."
result = predictor.predict(text)
# Output: {'emotions': ['remorse', 'sadness'], 'top_emotions': [{'emotion': 'remorse', 'score': 0.758}, ...]}

Example 3: Mixed Emotions

text = "I feel anxious but hopeful about the future."
result = predictor.predict(text)
# Output: {'emotions': ['nervousness', 'optimism'], 'top_emotions': [{'emotion': 'nervousness', 'score': 0.710}, {'emotion': 'optimism', 'score': 0.645}, ...]}

Example 4: Surprise

text = "I can't believe this happened!"
result = predictor.predict(text)
# Output: {'emotions': ['surprise', 'realization'], 'top_emotions': [{'emotion': 'surprise', 'score': 0.747}, ...]}

Example 5: Gratitude

text = "Thank you so much for all your help and support!"
result = predictor.predict(text)
# Output: {'emotions': ['gratitude', 'admiration'], 'top_emotions': [{'emotion': 'gratitude', 'score': 0.923}, ...]}

Example 6: Multiple Strong Emotions

text = "This is absolutely disgusting and infuriating!"
result = predictor.predict(text)
# Output: {'emotions': ['disgust', 'anger', 'annoyance'], 'top_emotions': [{'emotion': 'disgust', 'score': 0.834}, {'emotion': 'anger', 'score': 0.789}, ...]}

⚙️ Training Details

Parameter	Value
Base model	roberta-large
Task	Multi-label classification
Epochs	5 (first 2 frozen encoder)
Learning rate	2.6e-5 (Optuna-tuned)
Optimizer	AdamW
Weight decay	0.01
Warmup ratio	0.1
Loss	Focal Loss (α=0.38, γ=2.8)
Gradient accumulation	16
Scheduler	Linear decay
Mixed precision	✅ FP16
Pooling	Mean pooling
Dropout	0.41
Batch size	2 × 16 (accumulated)
Early stopping	Patience = 3
Threshold range	[0.05, 0.95] optimized per label

Thresholds are optimized individually to maximize per-label F1 and saved in optimal_thresholds.json.

📊 Evaluation Metrics (Test Split)

Metric Type	Precision	Recall	F1
Macro (unweighted)	0.497	0.576	0.519
Weighted by label frequency	0.505	0.585	0.528

This model achieves a balanced macro-F1 above 0.52, maintaining strong recall even for underrepresented emotions.

🔍 Per-Label Performance (Best performance on Val Split)

label	accuracy	precision	recall	f1	mcc	support	threshold
AVG	0.962	0.541	0.592	0.551	0.538	6380	0.645
admiration	0.95	0.718	0.732	0.725	0.697	488	0.616
amusement	0.975	0.727	0.888	0.799	0.791	303	0.652
anger	0.96	0.453	0.544	0.494	0.476	195	0.648
annoyance	0.911	0.326	0.558	0.412	0.383	303	0.568
approval	0.885	0.307	0.456	0.367	0.314	397	0.543
caring	0.971	0.481	0.582	0.527	0.514	153	0.618
confusion	0.973	0.521	0.401	0.454	0.444	152	0.653
curiosity	0.947	0.452	0.71	0.553	0.541	248	0.601
desire	0.989	0.642	0.558	0.597	0.593	77	0.729
disappointment	0.96	0.337	0.344	0.34	0.32	163	0.605
disapproval	0.908	0.329	0.678	0.443	0.431	292	0.545
disgust	0.984	0.549	0.464	0.503	0.496	97	0.697
embarrassment	0.992	0.417	0.714	0.526	0.542	35	0.674
excitement	0.977	0.357	0.365	0.361	0.349	96	0.676
fear	0.991	0.724	0.7	0.712	0.707	90	0.681
gratitude	0.989	0.96	0.877	0.917	0.912	358	0.685
grief	0.998	0.545	0.462	0.5	0.501	13	0.774
joy	0.974	0.594	0.587	0.591	0.577	172	0.642
love	0.979	0.718	0.889	0.794	0.788	252	0.648
nervousness	0.995	0.391	0.429	0.409	0.407	21	0.7
optimism	0.973	0.69	0.555	0.615	0.606	209	0.669
pride	0.999	0.889	0.533	0.667	0.688	15	0.694
realization	0.976	0.475	0.228	0.309	0.319	127	0.658
relief	0.992	0.217	0.556	0.312	0.344	18	0.594
remorse	0.992	0.648	0.838	0.731	0.733	68	0.728
sadness	0.976	0.543	0.573	0.558	0.546	143	0.65
surprise	0.978	0.528	0.589	0.557	0.546	129	0.664
neutral	0.753	0.593	0.767	0.669	0.487	1766	0.446

🔍 Per-Label Performance (Test Split)

Label	Accuracy	Precision	Recall	F1	MCC	Support	Threshold
AVG	0.961	0.497	0.576	0.519	0.506	6329	0.645
admiration	0.940	0.676	0.690	0.683	0.650	504	0.616
amusement	0.980	0.740	0.905	0.814	0.808	264	0.652
anger	0.959	0.455	0.556	0.500	0.482	198	0.648
annoyance	0.905	0.315	0.519	0.392	0.356	320	0.568
approval	0.900	0.331	0.538	0.410	0.371	351	0.543
caring	0.967	0.374	0.496	0.427	0.414	135	0.618
confusion	0.971	0.487	0.477	0.482	0.467	153	0.653
curiosity	0.945	0.483	0.715	0.577	0.561	284	0.601
desire	0.987	0.647	0.398	0.493	0.501	83	0.729
disappointment	0.962	0.313	0.311	0.312	0.293	151	0.605
disapproval	0.899	0.284	0.697	0.403	0.402	267	0.545
disgust	0.978	0.531	0.423	0.471	0.463	123	0.697
embarrassment	0.989	0.288	0.459	0.354	0.358	37	0.674
excitement	0.977	0.400	0.447	0.422	0.411	103	0.676
fear	0.988	0.564	0.795	0.660	0.664	78	0.681
gratitude	0.989	0.939	0.881	0.909	0.904	352	0.685
grief	0.999	0.429	0.500	0.462	0.462	6	0.774
joy	0.974	0.557	0.640	0.595	0.584	161	0.642
love	0.979	0.731	0.832	0.778	0.769	238	0.648
nervousness	0.994	0.333	0.348	0.340	0.338	23	0.700
optimism	0.975	0.686	0.516	0.589	0.583	186	0.669
pride	0.997	0.538	0.438	0.483	0.484	16	0.694
realization	0.970	0.380	0.207	0.268	0.266	145	0.658
relief	0.993	0.171	0.636	0.269	0.327	11	0.594
remorse	0.991	0.540	0.839	0.657	0.669	56	0.728
sadness	0.975	0.580	0.532	0.555	0.543	156	0.650
surprise	0.977	0.553	0.553	0.553	0.541	141	0.664
neutral	0.753	0.595	0.776	0.674	0.491	1787	0.446

🥇 Why This Model Outperforms Other GoEmotions Models

Most GoEmotions models on Hugging Face use BCE loss and a fixed 0.5 threshold.
While effective for frequent emotions, they perform poorly on rare ones.

This model overcomes those limits via:

1️⃣ Adaptive per-label thresholds — Each label has a unique optimized decision boundary, maximizing per-label F1 and balancing recall/precision.
2️⃣ Focal Loss — Down-weights easy examples and boosts hard ones, enhancing minority class generalization.
3️⃣ Mean Pooling — Captures richer emotional nuance than CLS-based pooling.
4️⃣ Targeted Augmentation — Paraphrasing and synonym replacement to strengthen rare emotion classes.
5️⃣ Gradual Unfreezing — Stabilizes fine-tuning by freezing encoder early epochs.
6️⃣ Balanced Macro Metrics — Consistent fairness across all emotion classes.

🧾 Model Details

Detail	Description
Architecture	RoBERTa-Large + Mean Pooling + 2-layer MLP
Task Type	Multi-label emotion classification
Input Length	Up to 128 tokens
Output	28 sigmoid probabilities
Framework	PyTorch / Hugging Face Transformers
Mixed Precision	Supported (FP16)
License	MIT
Developer	Lakshya Kumar

🎯 Intended Use

Emotion detection in conversational AI systems
Social media sentiment analysis
Affective computing research
Psychological and HCI studies

⚠️ Limitations

Trained on Reddit; may not generalize to formal or non-English domains.
Rare emotions (e.g. grief, relief) have low support.
Detects linguistic cues, not true emotional state.

📎 Citation

@misc{lakshya2025robertalargegoemotions,
  title={RoBERTa-Large GoEmotions (Optimized Thresholds and Focal Loss)},
  author={Lakshya Kumar},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Lakssssshya/roberta-large-goemotions}}
}

🙏 Acknowledgments

Google Research for the GoEmotions dataset
Hugging Face for the transformers library
Meta AI for RoBERTa architecture

📧 Contact

For questions or collaborations:

Hugging Face: @Lakssssshya
Model Repository: roberta-large-goemotions

⭐ If you find this model useful, please give it a star and share it with others!

Downloads last month: 92

Model tree for Lakssssshya/roberta-large-goemotions

Base model

FacebookAI/roberta-large

Finetuned

(413)

this model

Lakssssshya
/

roberta-large-goemotions