Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,6 @@
|
|
| 2 |
license: cc-by-4.0
|
| 3 |
---
|
| 4 |
|
| 5 |
-
|
| 6 |
# Empathic-Insight-Face-Large
|
| 7 |
|
| 8 |
**Empathic-Insight-Face-Large** is a set of 40 emotion regression models trained on the EMoNet-FACE benchmark suite. Each model is designed to predict the intensity of a specific fine-grained emotion from facial expressions. These models are built on top of SigLIP2 image embeddings followed by MLP regression heads.
|
|
@@ -66,8 +65,7 @@ from transformers import AutoModel, AutoProcessor
|
|
| 66 |
from PIL import Image
|
| 67 |
import numpy as np
|
| 68 |
import json
|
| 69 |
-
import
|
| 70 |
-
from pathlib import Path
|
| 71 |
|
| 72 |
# --- 1. Define MLP Architecture (Big Model) ---
|
| 73 |
class MLP(nn.Module):
|
|
@@ -93,7 +91,10 @@ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
|
| 93 |
|
| 94 |
# === IMPORTANT: Set this to the directory where your .pth models are downloaded ===
|
| 95 |
# If you've cloned the repo, it might be "./" or the name of the cloned folder.
|
| 96 |
-
MODEL_DIRECTORY = Path("./Empathic-Insight-Face-
|
|
|
|
|
|
|
|
|
|
| 97 |
# ================================================================================
|
| 98 |
|
| 99 |
|
|
@@ -110,13 +111,17 @@ if neutral_stats_path.exists():
|
|
| 110 |
with open(neutral_stats_path, 'r') as f:
|
| 111 |
neutral_stats_all = json.load(f)
|
| 112 |
else:
|
| 113 |
-
print(f"Warning: Neutral stats file not found at {neutral_stats_path}. Mean subtraction will use 0.0.")
|
| 114 |
|
| 115 |
|
| 116 |
# Load all emotion MLP models
|
| 117 |
emotion_mlps = {}
|
| 118 |
-
print(f"Loading emotion MLP models from: {MODEL_DIRECTORY}")
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
model_key_name = pth_file.stem # e.g., "model_elation_best"
|
| 121 |
try:
|
| 122 |
mlp_model = MLP().to(device)
|
|
@@ -125,36 +130,40 @@ for pth_file in MODEL_DIRECTORY.glob("model_*_best.pth"):
|
|
| 125 |
emotion_mlps[model_key_name] = mlp_model
|
| 126 |
# print(f"Loaded: {model_key_name}")
|
| 127 |
except Exception as e:
|
| 128 |
-
print(f"Error loading {model_key_name}: {e}")
|
| 129 |
|
| 130 |
if not emotion_mlps:
|
| 131 |
-
print(f"Error: No MLP models loaded. Check MODEL_DIRECTORY
|
| 132 |
else:
|
| 133 |
print(f"Successfully loaded {len(emotion_mlps)} emotion MLP models.")
|
| 134 |
|
| 135 |
|
| 136 |
# --- 3. Prepare Image and Get Embedding ---
|
| 137 |
def normalized(a, axis=-1, order=2):
|
|
|
|
| 138 |
l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
|
| 139 |
l2[l2 == 0] = 1
|
| 140 |
return a / np.expand_dims(l2, axis)
|
| 141 |
|
| 142 |
# === Replace with your actual image path ===
|
| 143 |
-
#
|
| 144 |
# try:
|
| 145 |
-
# image = Image.open(
|
| 146 |
-
# inputs = siglip_processor(images=[image], return_tensors="pt").to(device)
|
| 147 |
# with torch.no_grad():
|
| 148 |
-
# image_features = siglip_model.get_image_features(**inputs)
|
| 149 |
-
# embedding_numpy_normalized = normalized(image_features.cpu().numpy())
|
| 150 |
# embedding_tensor = torch.from_numpy(embedding_numpy_normalized).to(device).float()
|
| 151 |
# except FileNotFoundError:
|
| 152 |
-
# print(f"Error: Image not found at {
|
| 153 |
# embedding_tensor = None # Or handle error as appropriate
|
|
|
|
|
|
|
|
|
|
| 154 |
# ==========================================
|
| 155 |
|
| 156 |
# --- For demonstration, let's use a random embedding if no image is processed ---
|
| 157 |
-
print("\nUsing a random embedding for demonstration purposes.")
|
| 158 |
embedding_tensor = torch.randn(1, 1152).to(device).float()
|
| 159 |
# ==============================================================================
|
| 160 |
|
|
@@ -168,6 +177,7 @@ if embedding_tensor is not None and emotion_mlps:
|
|
| 168 |
neutral_mean = neutral_stats_all.get(model_key_name, {}).get("mean", 0.0)
|
| 169 |
mean_subtracted_score = raw_score - neutral_mean
|
| 170 |
|
|
|
|
| 171 |
emotion_name = model_key_name.replace("model_", "").replace("_best", "").replace("_", " ").title()
|
| 172 |
results[emotion_name] = {
|
| 173 |
"raw_score": raw_score,
|
|
@@ -175,63 +185,59 @@ if embedding_tensor is not None and emotion_mlps:
|
|
| 175 |
"mean_subtracted_score": mean_subtracted_score
|
| 176 |
}
|
| 177 |
|
| 178 |
-
# Print results
|
| 179 |
-
print("\n--- Emotion Scores ---")
|
| 180 |
-
|
| 181 |
-
|
|
|
|
| 182 |
else:
|
| 183 |
-
print("Skipping inference as either
|
| 184 |
```
|
|
|
|
| 185 |
## Performance on EMoNet-FACE HQ Benchmark
|
| 186 |
|
| 187 |
The Empathic-Insight-Face models demonstrate strong performance, achieving near human-expert-level agreement on the EMoNet-FACE HQ benchmark.
|
| 188 |
|
| 189 |
-
Key Metric: Weighted Kappa (κ<sub>w</sub>) Agreement with Human Annotators
|
| 190 |
-
(Aggregated pairwise agreement between model predictions and individual human expert annotations on the EMoNet-FACE HQ dataset)
|
| 191 |
|
| 192 |
-
Annotator Group
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
Empathic-Insight-Face
|
| 196 |
-
|
| 197 |
-
|
|
|
|
| 198 |
|
| 199 |
-
Human inter-annotator agreement (pairwise κ<sub>w</sub>) varies per annotator; this is an approximate range from Table 6 in the paper
|
| 200 |
|
| 201 |
-
Interpretation (from paper Figure 3 & Table 6)
|
| 202 |
|
| 203 |
-
Empathic-Insight-Face LARGE (our big models) achieves agreement scores that are statistically very close to human inter-annotator agreement and significantly outperforms other evaluated systems like proprietary models and general-purpose VLMs on this benchmark.
|
| 204 |
-
|
| 205 |
-
The performance indicates that with focused dataset construction and careful fine-tuning, specialized models can approach human-level reliability on synthetic facial emotion recognition tasks for fine-grained emotions.
|
| 206 |
|
| 207 |
For more detailed benchmark results, including per-emotion performance and comparisons with other models using Spearman's Rho, please refer to the full EMoNet-FACE paper (Figures 3, 4, 9 and Table 6 in particular).
|
| 208 |
|
| 209 |
## Taxonomy
|
| 210 |
|
| 211 |
The 40 emotion categories are:
|
| 212 |
-
Affection, Amusement, Anger, Astonishment/Surprise, Awe, Bitterness, Concentration, Confusion, Contemplation, Contempt, Contentment, Disappointment, Disgust, Distress, Doubt, Elation, Embarrassment, Emotional Numbness, Fatigue/Exhaustion, Fear, Helplessness, Hope/Enthusiasm/Optimism, Impatience and Irritability, Infatuation, Interest, Intoxication/Altered States of Consciousness, Jealousy & Envy, Longing, Malevolence/Malice, Pain, Pleasure/Ecstasy, Pride, Relief, Sadness, Sexual Lust, Shame, Sourness, Teasing, Thankfulness/Gratitude, Triumph
|
| 213 |
|
| 214 |
-
(See Table 4 in the paper for associated descriptive words for each category)
|
| 215 |
|
| 216 |
## Limitations
|
| 217 |
|
| 218 |
-
Synthetic Data
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
Cultural Universality: The 40-category taxonomy, while expert-validated, is one perspective; its universality across cultures is an open research question.
|
| 223 |
-
|
| 224 |
-
Subjectivity: Emotion perception is inherently subjective.
|
| 225 |
|
| 226 |
## Ethical Considerations
|
| 227 |
|
| 228 |
The EMoNet-FACE suite was developed with ethical considerations in mind, including:
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
No PII: All images are synthetic, and no personally identifiable information was used.
|
| 233 |
-
|
| 234 |
-
Responsible Use: These models are released for research. Users are urged to consider the ethical implications of their applications and avoid misuse, such as for emotional manipulation or in ways that could lead to unfair or harmful outcomes.
|
| 235 |
|
| 236 |
Please refer to the "Ethical Considerations" and "Data Integrity, Safety, and Fairness" sections in the EMoNet-FACE paper for a comprehensive discussion.
|
| 237 |
|
|
@@ -239,10 +245,11 @@ Please refer to the "Ethical Considerations" and "Data Integrity, Safety, and Fa
|
|
| 239 |
|
| 240 |
If you use these models or the EMoNet-FACE benchmark in your research, please cite the original paper:
|
| 241 |
|
|
|
|
| 242 |
@inproceedings{schuhmann2025emonetface,
|
| 243 |
title={{EMONET-FACE: An Expert-Annotated Benchmark for Synthetic Emotion Recognition}},
|
| 244 |
author={Schuhmann, Christoph and Kaczmarczyk, Robert and Rabby, Gollam and Kraus, Maurice and Friedrich, Felix and Nguyen, Huu and Kalyan, Krishna and Nadi, Kourosh and Kersting, Kristian and Auer, Sören},
|
| 245 |
booktitle={Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
|
| 246 |
year={2025} % Or actual year of publication
|
| 247 |
% TODO: Add URL/DOI when available
|
| 248 |
-
}
|
|
|
|
| 2 |
license: cc-by-4.0
|
| 3 |
---
|
| 4 |
|
|
|
|
| 5 |
# Empathic-Insight-Face-Large
|
| 6 |
|
| 7 |
**Empathic-Insight-Face-Large** is a set of 40 emotion regression models trained on the EMoNet-FACE benchmark suite. Each model is designed to predict the intensity of a specific fine-grained emotion from facial expressions. These models are built on top of SigLIP2 image embeddings followed by MLP regression heads.
|
|
|
|
| 65 |
from PIL import Image
|
| 66 |
import numpy as np
|
| 67 |
import json
|
| 68 |
+
from pathlib import Path # Used for cleaner path handling
|
|
|
|
| 69 |
|
| 70 |
# --- 1. Define MLP Architecture (Big Model) ---
|
| 71 |
class MLP(nn.Module):
|
|
|
|
| 91 |
|
| 92 |
# === IMPORTANT: Set this to the directory where your .pth models are downloaded ===
|
| 93 |
# If you've cloned the repo, it might be "./" or the name of the cloned folder.
|
| 94 |
+
# Example: MODEL_DIRECTORY = Path("./Empathic-Insight-Face-Large_cloned_repo")
|
| 95 |
+
MODEL_DIRECTORY = Path(".") # Assumes models are in the current directory or a sub-directory
|
| 96 |
+
# If the models are in the root of where this script runs after cloning, "." is fine.
|
| 97 |
+
# If they are in a subfolder, e.g., "Empathic-Insight-Face-Large", use Path("./Empathic-Insight-Face-Large")
|
| 98 |
# ================================================================================
|
| 99 |
|
| 100 |
|
|
|
|
| 111 |
with open(neutral_stats_path, 'r') as f:
|
| 112 |
neutral_stats_all = json.load(f)
|
| 113 |
else:
|
| 114 |
+
print(f"Warning: Neutral stats file not found at {neutral_stats_path}. Mean subtraction will use 0.0 for all models.")
|
| 115 |
|
| 116 |
|
| 117 |
# Load all emotion MLP models
|
| 118 |
emotion_mlps = {}
|
| 119 |
+
print(f"Loading emotion MLP models from: {MODEL_DIRECTORY.resolve()}") # .resolve() gives absolute path
|
| 120 |
+
model_files_found = list(MODEL_DIRECTORY.glob("model_*_best.pth"))
|
| 121 |
+
if not model_files_found:
|
| 122 |
+
print(f"Warning: No model files found in {MODEL_DIRECTORY.resolve()}. Please check the MODEL_DIRECTORY path.")
|
| 123 |
+
|
| 124 |
+
for pth_file in model_files_found:
|
| 125 |
model_key_name = pth_file.stem # e.g., "model_elation_best"
|
| 126 |
try:
|
| 127 |
mlp_model = MLP().to(device)
|
|
|
|
| 130 |
emotion_mlps[model_key_name] = mlp_model
|
| 131 |
# print(f"Loaded: {model_key_name}")
|
| 132 |
except Exception as e:
|
| 133 |
+
print(f"Error loading {model_key_name} from {pth_file}: {e}")
|
| 134 |
|
| 135 |
if not emotion_mlps:
|
| 136 |
+
print(f"Error: No MLP models were successfully loaded. Check MODEL_DIRECTORY and file integrity.")
|
| 137 |
else:
|
| 138 |
print(f"Successfully loaded {len(emotion_mlps)} emotion MLP models.")
|
| 139 |
|
| 140 |
|
| 141 |
# --- 3. Prepare Image and Get Embedding ---
|
| 142 |
def normalized(a, axis=-1, order=2):
|
| 143 |
+
a = np.asarray(a) # Ensure 'a' is a numpy array
|
| 144 |
l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
|
| 145 |
l2[l2 == 0] = 1
|
| 146 |
return a / np.expand_dims(l2, axis)
|
| 147 |
|
| 148 |
# === Replace with your actual image path ===
|
| 149 |
+
# image_path_str = "path/to/your/image.jpg"
|
| 150 |
# try:
|
| 151 |
+
# image = Image.open(image_path_str).convert("RGB")
|
| 152 |
+
# inputs = siglip_processor(images=[image], return_tensors="pt", padding="max_length", truncation=True).to(device)
|
| 153 |
# with torch.no_grad():
|
| 154 |
+
# image_features = siglip_model.get_image_features(**inputs) # PyTorch tensor
|
| 155 |
+
# embedding_numpy_normalized = normalized(image_features.cpu().numpy()) # Normalize on CPU
|
| 156 |
# embedding_tensor = torch.from_numpy(embedding_numpy_normalized).to(device).float()
|
| 157 |
# except FileNotFoundError:
|
| 158 |
+
# print(f"Error: Image not found at {image_path_str}")
|
| 159 |
# embedding_tensor = None # Or handle error as appropriate
|
| 160 |
+
# except Exception as e:
|
| 161 |
+
# print(f"Error processing image {image_path_str}: {e}")
|
| 162 |
+
# embedding_tensor = None
|
| 163 |
# ==========================================
|
| 164 |
|
| 165 |
# --- For demonstration, let's use a random embedding if no image is processed ---
|
| 166 |
+
print("\nUsing a random embedding for demonstration purposes as no image path was set.")
|
| 167 |
embedding_tensor = torch.randn(1, 1152).to(device).float()
|
| 168 |
# ==============================================================================
|
| 169 |
|
|
|
|
| 177 |
neutral_mean = neutral_stats_all.get(model_key_name, {}).get("mean", 0.0)
|
| 178 |
mean_subtracted_score = raw_score - neutral_mean
|
| 179 |
|
| 180 |
+
# Derive a human-readable emotion name from the model key
|
| 181 |
emotion_name = model_key_name.replace("model_", "").replace("_best", "").replace("_", " ").title()
|
| 182 |
results[emotion_name] = {
|
| 183 |
"raw_score": raw_score,
|
|
|
|
| 185 |
"mean_subtracted_score": mean_subtracted_score
|
| 186 |
}
|
| 187 |
|
| 188 |
+
# Print results, sorted alphabetically by emotion name
|
| 189 |
+
print("\n--- Emotion Scores (Mean-Subtracted) ---")
|
| 190 |
+
# Sort items by emotion name for consistent output
|
| 191 |
+
for emotion, scores in sorted(results.items()):
|
| 192 |
+
print(f"{emotion:<35}: {scores['mean_subtracted_score']:.4f} (Raw: {scores['raw_score']:.4f}, Neutral Mean: {scores['neutral_mean']:.4f})")
|
| 193 |
else:
|
| 194 |
+
print("Skipping inference as either embedding_tensor is None or no MLP models were loaded.")
|
| 195 |
```
|
| 196 |
+
|
| 197 |
## Performance on EMoNet-FACE HQ Benchmark
|
| 198 |
|
| 199 |
The Empathic-Insight-Face models demonstrate strong performance, achieving near human-expert-level agreement on the EMoNet-FACE HQ benchmark.
|
| 200 |
|
| 201 |
+
**Key Metric: Weighted Kappa (κ<sub>w</sub>) Agreement with Human Annotators**
|
| 202 |
+
*(Aggregated pairwise agreement between model predictions and individual human expert annotations on the EMoNet-FACE HQ dataset)*
|
| 203 |
|
| 204 |
+
| Annotator Group | Mean κ<sub>w</sub> (vs. Humans) |
|
| 205 |
+
| :------------------------------ | :---------------------------: |
|
| 206 |
+
| Human Annotators (vs. Humans) | ~0.20 - 0.26* |
|
| 207 |
+
| **Empathic-Insight-Face LARGE** | **~0.18** |
|
| 208 |
+
| Empathic-Insight-Face SMALL | ~0.14 |
|
| 209 |
+
| Proprietary Models (e.g., HumeFace) | ~0.11 |
|
| 210 |
+
| Random Baseline | ~0.00 |
|
| 211 |
|
| 212 |
+
*\*Human inter-annotator agreement (pairwise κ<sub>w</sub>) varies per annotator; this is an approximate range from Table 6 in the paper.*
|
| 213 |
|
| 214 |
+
**Interpretation (from paper Figure 3 & Table 6):**
|
| 215 |
|
| 216 |
+
* **Empathic-Insight-Face LARGE** (our big models) achieves agreement scores that are statistically very close to human inter-annotator agreement and significantly outperforms other evaluated systems like proprietary models and general-purpose VLMs on this benchmark.
|
| 217 |
+
* The performance indicates that with focused dataset construction and careful fine-tuning, specialized models can approach human-level reliability on synthetic facial emotion recognition tasks for fine-grained emotions.
|
|
|
|
| 218 |
|
| 219 |
For more detailed benchmark results, including per-emotion performance and comparisons with other models using Spearman's Rho, please refer to the full EMoNet-FACE paper (Figures 3, 4, 9 and Table 6 in particular).
|
| 220 |
|
| 221 |
## Taxonomy
|
| 222 |
|
| 223 |
The 40 emotion categories are:
|
| 224 |
+
*Affection, Amusement, Anger, Astonishment/Surprise, Awe, Bitterness, Concentration, Confusion, Contemplation, Contempt, Contentment, Disappointment, Disgust, Distress, Doubt, Elation, Embarrassment, Emotional Numbness, Fatigue/Exhaustion, Fear, Helplessness, Hope/Enthusiasm/Optimism, Impatience and Irritability, Infatuation, Interest, Intoxication/Altered States of Consciousness, Jealousy & Envy, Longing, Malevolence/Malice, Pain, Pleasure/Ecstasy, Pride, Relief, Sadness, Sexual Lust, Shame, Sourness, Teasing, Thankfulness/Gratitude, Triumph.*
|
| 225 |
|
| 226 |
+
*(See Table 4 in the paper for associated descriptive words for each category).*
|
| 227 |
|
| 228 |
## Limitations
|
| 229 |
|
| 230 |
+
* **Synthetic Data:** Models are trained on synthetic faces. Generalization to real-world, diverse, in-the-wild images is not guaranteed and requires further investigation.
|
| 231 |
+
* **Static Faces:** Analysis is restricted to static facial expressions, without broader contextual or multimodal cues.
|
| 232 |
+
* **Cultural Universality:** The 40-category taxonomy, while expert-validated, is one perspective; its universality across cultures is an open research question.
|
| 233 |
+
* **Subjectivity:** Emotion perception is inherently subjective.
|
|
|
|
|
|
|
|
|
|
| 234 |
|
| 235 |
## Ethical Considerations
|
| 236 |
|
| 237 |
The EMoNet-FACE suite was developed with ethical considerations in mind, including:
|
| 238 |
+
* **Mitigating Bias:** Efforts were made to create demographically diverse synthetic datasets and prompts were manually filtered.
|
| 239 |
+
* **No PII:** All images are synthetic, and no personally identifiable information was used.
|
| 240 |
+
* **Responsible Use:** These models are released for research. Users are urged to consider the ethical implications of their applications and avoid misuse, such as for emotional manipulation or in ways that could lead to unfair or harmful outcomes.
|
|
|
|
|
|
|
|
|
|
| 241 |
|
| 242 |
Please refer to the "Ethical Considerations" and "Data Integrity, Safety, and Fairness" sections in the EMoNet-FACE paper for a comprehensive discussion.
|
| 243 |
|
|
|
|
| 245 |
|
| 246 |
If you use these models or the EMoNet-FACE benchmark in your research, please cite the original paper:
|
| 247 |
|
| 248 |
+
```bibtex
|
| 249 |
@inproceedings{schuhmann2025emonetface,
|
| 250 |
title={{EMONET-FACE: An Expert-Annotated Benchmark for Synthetic Emotion Recognition}},
|
| 251 |
author={Schuhmann, Christoph and Kaczmarczyk, Robert and Rabby, Gollam and Kraus, Maurice and Friedrich, Felix and Nguyen, Huu and Kalyan, Krishna and Nadi, Kourosh and Kersting, Kristian and Auer, Sören},
|
| 252 |
booktitle={Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
|
| 253 |
year={2025} % Or actual year of publication
|
| 254 |
% TODO: Add URL/DOI when available
|
| 255 |
+
}
|