wav2vec2-base-finetuned-ravdess-personalization

This model is part of our ICASSP 2026 paper: Test-Time Adaptation Methods for Speech Emotion Recognition.

Model Description

Wav2Vec2 fine-tuned on RAVDESS for Task1 (intra-corpus personalization)

  • Base Model: facebook/wav2vec2-base
  • Dataset: RAVDESS
  • Task: Task1 - intra-corpus personalization
  • Emotions: neutral, calm, happy, sad, angry, fearful, disgust, surprised
  • Number of Classes: 8

Usage

from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
import torch
import torchaudio

# Load model and processor
model_checkpoint = "LincolnD/wav2vec2-base-finetuned-ravdess-personalization"
processor = AutoFeatureExtractor.from_pretrained(model_checkpoint)
model = AutoModelForAudioClassification.from_pretrained(model_checkpoint)

# Load and process audio
audio_path = "path/to/your/audio.wav"
waveform, sample_rate = torchaudio.load(audio_path)

# Resample to 16kHz if needed
if sample_rate != 16000:
    resampler = torchaudio.transforms.Resample(sample_rate, 16000)
    waveform = resampler(waveform)

# Process and predict
inputs = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt", padding=True)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

emotion_id = predictions.item()
emotion = model.config.id2label[emotion_id]
print(f"Predicted emotion: {emotion}")

Training Details

This model was fine-tuned from the pre-trained Wav2Vec2 base model on the RAVDESS dataset.

Training Data

  • Dataset: RAVDESS
  • Emotions: 8 classes
  • Sampling Rate: 16kHz

Intended Use

This model is designed for use in Test-Time Adaptation (TTA) experiments as part of our research on adapting speech emotion recognition systems to new domains and speakers.

Evaluation

For detailed evaluation results and comparison with various TTA methods, please refer to our paper.

Model Card Authors

LincolnD

License

MIT License - See repository for details.

Downloads last month
-
Safetensors
Model size
94.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train LincolnD/wav2vec2-base-finetuned-ravdess-personalization