ChristophSchuhmann commited on
Commit
2ae77c5
·
verified ·
1 Parent(s): 6ca832e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -50
README.md CHANGED
@@ -2,7 +2,6 @@
2
  license: cc-by-4.0
3
  ---
4
 
5
-
6
  # Empathic-Insight-Face-Large
7
 
8
  **Empathic-Insight-Face-Large** is a set of 40 emotion regression models trained on the EMoNet-FACE benchmark suite. Each model is designed to predict the intensity of a specific fine-grained emotion from facial expressions. These models are built on top of SigLIP2 image embeddings followed by MLP regression heads.
@@ -66,8 +65,7 @@ from transformers import AutoModel, AutoProcessor
66
  from PIL import Image
67
  import numpy as np
68
  import json
69
- import os # For listing model files
70
- from pathlib import Path
71
 
72
  # --- 1. Define MLP Architecture (Big Model) ---
73
  class MLP(nn.Module):
@@ -93,7 +91,10 @@ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
93
 
94
  # === IMPORTANT: Set this to the directory where your .pth models are downloaded ===
95
  # If you've cloned the repo, it might be "./" or the name of the cloned folder.
96
- MODEL_DIRECTORY = Path("./Empathic-Insight-Face-Large") # ADJUST THIS PATH
 
 
 
97
  # ================================================================================
98
 
99
 
@@ -110,13 +111,17 @@ if neutral_stats_path.exists():
110
  with open(neutral_stats_path, 'r') as f:
111
  neutral_stats_all = json.load(f)
112
  else:
113
- print(f"Warning: Neutral stats file not found at {neutral_stats_path}. Mean subtraction will use 0.0.")
114
 
115
 
116
  # Load all emotion MLP models
117
  emotion_mlps = {}
118
- print(f"Loading emotion MLP models from: {MODEL_DIRECTORY}")
119
- for pth_file in MODEL_DIRECTORY.glob("model_*_best.pth"):
 
 
 
 
120
  model_key_name = pth_file.stem # e.g., "model_elation_best"
121
  try:
122
  mlp_model = MLP().to(device)
@@ -125,36 +130,40 @@ for pth_file in MODEL_DIRECTORY.glob("model_*_best.pth"):
125
  emotion_mlps[model_key_name] = mlp_model
126
  # print(f"Loaded: {model_key_name}")
127
  except Exception as e:
128
- print(f"Error loading {model_key_name}: {e}")
129
 
130
  if not emotion_mlps:
131
- print(f"Error: No MLP models loaded. Check MODEL_DIRECTORY: {MODEL_DIRECTORY}")
132
  else:
133
  print(f"Successfully loaded {len(emotion_mlps)} emotion MLP models.")
134
 
135
 
136
  # --- 3. Prepare Image and Get Embedding ---
137
  def normalized(a, axis=-1, order=2):
 
138
  l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
139
  l2[l2 == 0] = 1
140
  return a / np.expand_dims(l2, axis)
141
 
142
  # === Replace with your actual image path ===
143
- # image_path = "path/to/your/image.jpg"
144
  # try:
145
- # image = Image.open(image_path).convert("RGB")
146
- # inputs = siglip_processor(images=[image], return_tensors="pt").to(device)
147
  # with torch.no_grad():
148
- # image_features = siglip_model.get_image_features(**inputs)
149
- # embedding_numpy_normalized = normalized(image_features.cpu().numpy())
150
  # embedding_tensor = torch.from_numpy(embedding_numpy_normalized).to(device).float()
151
  # except FileNotFoundError:
152
- # print(f"Error: Image not found at {image_path}")
153
  # embedding_tensor = None # Or handle error as appropriate
 
 
 
154
  # ==========================================
155
 
156
  # --- For demonstration, let's use a random embedding if no image is processed ---
157
- print("\nUsing a random embedding for demonstration purposes.")
158
  embedding_tensor = torch.randn(1, 1152).to(device).float()
159
  # ==============================================================================
160
 
@@ -168,6 +177,7 @@ if embedding_tensor is not None and emotion_mlps:
168
  neutral_mean = neutral_stats_all.get(model_key_name, {}).get("mean", 0.0)
169
  mean_subtracted_score = raw_score - neutral_mean
170
 
 
171
  emotion_name = model_key_name.replace("model_", "").replace("_best", "").replace("_", " ").title()
172
  results[emotion_name] = {
173
  "raw_score": raw_score,
@@ -175,63 +185,59 @@ if embedding_tensor is not None and emotion_mlps:
175
  "mean_subtracted_score": mean_subtracted_score
176
  }
177
 
178
- # Print results
179
- print("\n--- Emotion Scores ---")
180
- for emotion, scores in sorted(results.items()):
181
- print(f"{emotion:<35}: Mean-Subtracted = {scores['mean_subtracted_score']:.4f} (Raw = {scores['raw_score']:.4f}, Neutral Mean = {scores['neutral_mean']:.4f})")
 
182
  else:
183
- print("Skipping inference as either embedding tensor is None or no MLP models were loaded.")
184
  ```
 
185
  ## Performance on EMoNet-FACE HQ Benchmark
186
 
187
  The Empathic-Insight-Face models demonstrate strong performance, achieving near human-expert-level agreement on the EMoNet-FACE HQ benchmark.
188
 
189
- Key Metric: Weighted Kappa (κ<sub>w</sub>) Agreement with Human Annotators
190
- (Aggregated pairwise agreement between model predictions and individual human expert annotations on the EMoNet-FACE HQ dataset)
191
 
192
- Annotator Group Mean κ<sub>w</sub> (vs. Humans)
193
- Human Annotators (vs. Humans) ~0.20 - 0.26*
194
- Empathic-Insight-Face LARGE ~0.18
195
- Empathic-Insight-Face SMALL ~0.14
196
- Proprietary Models (e.g., HumeFace) ~0.11
197
- Random Baseline ~0.00
 
198
 
199
- Human inter-annotator agreement (pairwise κ<sub>w</sub>) varies per annotator; this is an approximate range from Table 6 in the paper.
200
 
201
- Interpretation (from paper Figure 3 & Table 6):
202
 
203
- Empathic-Insight-Face LARGE (our big models) achieves agreement scores that are statistically very close to human inter-annotator agreement and significantly outperforms other evaluated systems like proprietary models and general-purpose VLMs on this benchmark.
204
-
205
- The performance indicates that with focused dataset construction and careful fine-tuning, specialized models can approach human-level reliability on synthetic facial emotion recognition tasks for fine-grained emotions.
206
 
207
  For more detailed benchmark results, including per-emotion performance and comparisons with other models using Spearman's Rho, please refer to the full EMoNet-FACE paper (Figures 3, 4, 9 and Table 6 in particular).
208
 
209
  ## Taxonomy
210
 
211
  The 40 emotion categories are:
212
- Affection, Amusement, Anger, Astonishment/Surprise, Awe, Bitterness, Concentration, Confusion, Contemplation, Contempt, Contentment, Disappointment, Disgust, Distress, Doubt, Elation, Embarrassment, Emotional Numbness, Fatigue/Exhaustion, Fear, Helplessness, Hope/Enthusiasm/Optimism, Impatience and Irritability, Infatuation, Interest, Intoxication/Altered States of Consciousness, Jealousy & Envy, Longing, Malevolence/Malice, Pain, Pleasure/Ecstasy, Pride, Relief, Sadness, Sexual Lust, Shame, Sourness, Teasing, Thankfulness/Gratitude, Triumph.
213
 
214
- (See Table 4 in the paper for associated descriptive words for each category).
215
 
216
  ## Limitations
217
 
218
- Synthetic Data: Models are trained on synthetic faces. Generalization to real-world, diverse, in-the-wild images is not guaranteed and requires further investigation.
219
-
220
- Static Faces: Analysis is restricted to static facial expressions, without broader contextual or multimodal cues.
221
-
222
- Cultural Universality: The 40-category taxonomy, while expert-validated, is one perspective; its universality across cultures is an open research question.
223
-
224
- Subjectivity: Emotion perception is inherently subjective.
225
 
226
  ## Ethical Considerations
227
 
228
  The EMoNet-FACE suite was developed with ethical considerations in mind, including:
229
-
230
- Mitigating Bias: Efforts were made to create demographically diverse synthetic datasets and prompts were manually filtered.
231
-
232
- No PII: All images are synthetic, and no personally identifiable information was used.
233
-
234
- Responsible Use: These models are released for research. Users are urged to consider the ethical implications of their applications and avoid misuse, such as for emotional manipulation or in ways that could lead to unfair or harmful outcomes.
235
 
236
  Please refer to the "Ethical Considerations" and "Data Integrity, Safety, and Fairness" sections in the EMoNet-FACE paper for a comprehensive discussion.
237
 
@@ -239,10 +245,11 @@ Please refer to the "Ethical Considerations" and "Data Integrity, Safety, and Fa
239
 
240
  If you use these models or the EMoNet-FACE benchmark in your research, please cite the original paper:
241
 
 
242
  @inproceedings{schuhmann2025emonetface,
243
  title={{EMONET-FACE: An Expert-Annotated Benchmark for Synthetic Emotion Recognition}},
244
  author={Schuhmann, Christoph and Kaczmarczyk, Robert and Rabby, Gollam and Kraus, Maurice and Friedrich, Felix and Nguyen, Huu and Kalyan, Krishna and Nadi, Kourosh and Kersting, Kristian and Auer, Sören},
245
  booktitle={Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
246
  year={2025} % Or actual year of publication
247
  % TODO: Add URL/DOI when available
248
- }
 
2
  license: cc-by-4.0
3
  ---
4
 
 
5
  # Empathic-Insight-Face-Large
6
 
7
  **Empathic-Insight-Face-Large** is a set of 40 emotion regression models trained on the EMoNet-FACE benchmark suite. Each model is designed to predict the intensity of a specific fine-grained emotion from facial expressions. These models are built on top of SigLIP2 image embeddings followed by MLP regression heads.
 
65
  from PIL import Image
66
  import numpy as np
67
  import json
68
+ from pathlib import Path # Used for cleaner path handling
 
69
 
70
  # --- 1. Define MLP Architecture (Big Model) ---
71
  class MLP(nn.Module):
 
91
 
92
  # === IMPORTANT: Set this to the directory where your .pth models are downloaded ===
93
  # If you've cloned the repo, it might be "./" or the name of the cloned folder.
94
+ # Example: MODEL_DIRECTORY = Path("./Empathic-Insight-Face-Large_cloned_repo")
95
+ MODEL_DIRECTORY = Path(".") # Assumes models are in the current directory or a sub-directory
96
+ # If the models are in the root of where this script runs after cloning, "." is fine.
97
+ # If they are in a subfolder, e.g., "Empathic-Insight-Face-Large", use Path("./Empathic-Insight-Face-Large")
98
  # ================================================================================
99
 
100
 
 
111
  with open(neutral_stats_path, 'r') as f:
112
  neutral_stats_all = json.load(f)
113
  else:
114
+ print(f"Warning: Neutral stats file not found at {neutral_stats_path}. Mean subtraction will use 0.0 for all models.")
115
 
116
 
117
  # Load all emotion MLP models
118
  emotion_mlps = {}
119
+ print(f"Loading emotion MLP models from: {MODEL_DIRECTORY.resolve()}") # .resolve() gives absolute path
120
+ model_files_found = list(MODEL_DIRECTORY.glob("model_*_best.pth"))
121
+ if not model_files_found:
122
+ print(f"Warning: No model files found in {MODEL_DIRECTORY.resolve()}. Please check the MODEL_DIRECTORY path.")
123
+
124
+ for pth_file in model_files_found:
125
  model_key_name = pth_file.stem # e.g., "model_elation_best"
126
  try:
127
  mlp_model = MLP().to(device)
 
130
  emotion_mlps[model_key_name] = mlp_model
131
  # print(f"Loaded: {model_key_name}")
132
  except Exception as e:
133
+ print(f"Error loading {model_key_name} from {pth_file}: {e}")
134
 
135
  if not emotion_mlps:
136
+ print(f"Error: No MLP models were successfully loaded. Check MODEL_DIRECTORY and file integrity.")
137
  else:
138
  print(f"Successfully loaded {len(emotion_mlps)} emotion MLP models.")
139
 
140
 
141
  # --- 3. Prepare Image and Get Embedding ---
142
  def normalized(a, axis=-1, order=2):
143
+ a = np.asarray(a) # Ensure 'a' is a numpy array
144
  l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
145
  l2[l2 == 0] = 1
146
  return a / np.expand_dims(l2, axis)
147
 
148
  # === Replace with your actual image path ===
149
+ # image_path_str = "path/to/your/image.jpg"
150
  # try:
151
+ # image = Image.open(image_path_str).convert("RGB")
152
+ # inputs = siglip_processor(images=[image], return_tensors="pt", padding="max_length", truncation=True).to(device)
153
  # with torch.no_grad():
154
+ # image_features = siglip_model.get_image_features(**inputs) # PyTorch tensor
155
+ # embedding_numpy_normalized = normalized(image_features.cpu().numpy()) # Normalize on CPU
156
  # embedding_tensor = torch.from_numpy(embedding_numpy_normalized).to(device).float()
157
  # except FileNotFoundError:
158
+ # print(f"Error: Image not found at {image_path_str}")
159
  # embedding_tensor = None # Or handle error as appropriate
160
+ # except Exception as e:
161
+ # print(f"Error processing image {image_path_str}: {e}")
162
+ # embedding_tensor = None
163
  # ==========================================
164
 
165
  # --- For demonstration, let's use a random embedding if no image is processed ---
166
+ print("\nUsing a random embedding for demonstration purposes as no image path was set.")
167
  embedding_tensor = torch.randn(1, 1152).to(device).float()
168
  # ==============================================================================
169
 
 
177
  neutral_mean = neutral_stats_all.get(model_key_name, {}).get("mean", 0.0)
178
  mean_subtracted_score = raw_score - neutral_mean
179
 
180
+ # Derive a human-readable emotion name from the model key
181
  emotion_name = model_key_name.replace("model_", "").replace("_best", "").replace("_", " ").title()
182
  results[emotion_name] = {
183
  "raw_score": raw_score,
 
185
  "mean_subtracted_score": mean_subtracted_score
186
  }
187
 
188
+ # Print results, sorted alphabetically by emotion name
189
+ print("\n--- Emotion Scores (Mean-Subtracted) ---")
190
+ # Sort items by emotion name for consistent output
191
+ for emotion, scores in sorted(results.items()):
192
+ print(f"{emotion:<35}: {scores['mean_subtracted_score']:.4f} (Raw: {scores['raw_score']:.4f}, Neutral Mean: {scores['neutral_mean']:.4f})")
193
  else:
194
+ print("Skipping inference as either embedding_tensor is None or no MLP models were loaded.")
195
  ```
196
+
197
  ## Performance on EMoNet-FACE HQ Benchmark
198
 
199
  The Empathic-Insight-Face models demonstrate strong performance, achieving near human-expert-level agreement on the EMoNet-FACE HQ benchmark.
200
 
201
+ **Key Metric: Weighted Kappa (κ<sub>w</sub>) Agreement with Human Annotators**
202
+ *(Aggregated pairwise agreement between model predictions and individual human expert annotations on the EMoNet-FACE HQ dataset)*
203
 
204
+ | Annotator Group | Mean κ<sub>w</sub> (vs. Humans) |
205
+ | :------------------------------ | :---------------------------: |
206
+ | Human Annotators (vs. Humans) | ~0.20 - 0.26* |
207
+ | **Empathic-Insight-Face LARGE** | **~0.18** |
208
+ | Empathic-Insight-Face SMALL | ~0.14 |
209
+ | Proprietary Models (e.g., HumeFace) | ~0.11 |
210
+ | Random Baseline | ~0.00 |
211
 
212
+ *\*Human inter-annotator agreement (pairwise κ<sub>w</sub>) varies per annotator; this is an approximate range from Table 6 in the paper.*
213
 
214
+ **Interpretation (from paper Figure 3 & Table 6):**
215
 
216
+ * **Empathic-Insight-Face LARGE** (our big models) achieves agreement scores that are statistically very close to human inter-annotator agreement and significantly outperforms other evaluated systems like proprietary models and general-purpose VLMs on this benchmark.
217
+ * The performance indicates that with focused dataset construction and careful fine-tuning, specialized models can approach human-level reliability on synthetic facial emotion recognition tasks for fine-grained emotions.
 
218
 
219
  For more detailed benchmark results, including per-emotion performance and comparisons with other models using Spearman's Rho, please refer to the full EMoNet-FACE paper (Figures 3, 4, 9 and Table 6 in particular).
220
 
221
  ## Taxonomy
222
 
223
  The 40 emotion categories are:
224
+ *Affection, Amusement, Anger, Astonishment/Surprise, Awe, Bitterness, Concentration, Confusion, Contemplation, Contempt, Contentment, Disappointment, Disgust, Distress, Doubt, Elation, Embarrassment, Emotional Numbness, Fatigue/Exhaustion, Fear, Helplessness, Hope/Enthusiasm/Optimism, Impatience and Irritability, Infatuation, Interest, Intoxication/Altered States of Consciousness, Jealousy & Envy, Longing, Malevolence/Malice, Pain, Pleasure/Ecstasy, Pride, Relief, Sadness, Sexual Lust, Shame, Sourness, Teasing, Thankfulness/Gratitude, Triumph.*
225
 
226
+ *(See Table 4 in the paper for associated descriptive words for each category).*
227
 
228
  ## Limitations
229
 
230
+ * **Synthetic Data:** Models are trained on synthetic faces. Generalization to real-world, diverse, in-the-wild images is not guaranteed and requires further investigation.
231
+ * **Static Faces:** Analysis is restricted to static facial expressions, without broader contextual or multimodal cues.
232
+ * **Cultural Universality:** The 40-category taxonomy, while expert-validated, is one perspective; its universality across cultures is an open research question.
233
+ * **Subjectivity:** Emotion perception is inherently subjective.
 
 
 
234
 
235
  ## Ethical Considerations
236
 
237
  The EMoNet-FACE suite was developed with ethical considerations in mind, including:
238
+ * **Mitigating Bias:** Efforts were made to create demographically diverse synthetic datasets and prompts were manually filtered.
239
+ * **No PII:** All images are synthetic, and no personally identifiable information was used.
240
+ * **Responsible Use:** These models are released for research. Users are urged to consider the ethical implications of their applications and avoid misuse, such as for emotional manipulation or in ways that could lead to unfair or harmful outcomes.
 
 
 
241
 
242
  Please refer to the "Ethical Considerations" and "Data Integrity, Safety, and Fairness" sections in the EMoNet-FACE paper for a comprehensive discussion.
243
 
 
245
 
246
  If you use these models or the EMoNet-FACE benchmark in your research, please cite the original paper:
247
 
248
+ ```bibtex
249
  @inproceedings{schuhmann2025emonetface,
250
  title={{EMONET-FACE: An Expert-Annotated Benchmark for Synthetic Emotion Recognition}},
251
  author={Schuhmann, Christoph and Kaczmarczyk, Robert and Rabby, Gollam and Kraus, Maurice and Friedrich, Felix and Nguyen, Huu and Kalyan, Krishna and Nadi, Kourosh and Kersting, Kristian and Auer, Sören},
252
  booktitle={Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
253
  year={2025} % Or actual year of publication
254
  % TODO: Add URL/DOI when available
255
+ }