Ostrich-v1: Audio Mood Classification Model

Model Overview

Ostrich-v1 is a production-grade multi-label audio classification model fine-tuned for detecting musical moods in Indian music (Bollywood, Indie, Folk, Classical, Fusion, etc.).
It uses a teacher–student distillation pipeline to map rich acoustic representations to 31 fine-grained mood dimensions, including culturally specific emotional categories.

Developed by: beastLucifer
Model type: Multi-label Audio Classification
Domain: Music (Indian & South Asian context)
Languages: English, Hindi (musical semantics)
License: Apache 2.0
Finetuned from: sandychoii/distilhubert-finetuned-gtzan-audio-classification

Model Description

Ostrich-v1 bridges the gap between generic audio emotion models and the emotional textures unique to South Asian music.
The model is built on a DistilHuBERT backbone, fine-tuned on the Sangeetkar dataset using teacher-generated soft labels produced by a CLAP-based audio–text model.

It supports both:

Global moods (e.g., Happy, Sad, Angry)
Regionally grounded moods (e.g., Dard-bhari, Tapori, Sufi-romantic)

The output is a 31-dimensional probability vector, allowing multiple moods to coexist per track.

Model Sources

Repository: beastLucifer/ostrich-v1-audio-mood
Training Audio Dataset: beastLucifer/sangeetkar-mood-dataset
Teacher Label Dataset: beastLucifer/sangeetkar-teacher-labels

Intended Uses

Direct Use

Automated music tagging and metadata enrichment
Mood-based playlist generation
Music discovery systems for Indian sub-genres
Recommendation systems and catalog analytics

Out-of-Scope Use

Speech-to-text or speaker recognition
Environmental sound classification
Real-time or ultra–low-latency streaming inference
Non-musical audio domains

Bias, Risks, and Limitations

Label Noise:
Labels are distilled from a teacher model. Although class-wise weighting is applied, subtle secondary moods may bleed into primary predictions.
Genre Bias:
Performance may degrade on:
- Purely instrumental tracks
- Rare regional folk styles underrepresented in the teacher corpus
Temporal Assumption:
Optimized for ≤30s chunks; long-form compositions should be chunked.

How to Use the Model

from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
import torch
import librosa

model_id = "beastLucifer/ostrich-v1-audio-mood"

model = AutoModelForAudioClassification.from_pretrained(model_id)
processor = AutoFeatureExtractor.from_pretrained(model_id)

# Load audio
audio, sr = librosa.load("your_song.wav", sr=16000)

inputs = processor(
    audio,
    sampling_rate=16000,
    return_tensors="pt"
)

# Inference
with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.sigmoid(logits)  # multi-label probabilities

# Map predictions
id2label = model.config.id2label
predictions = {
    id2label[i]: probs[0][i].item()
    for i in range(len(id2label))
}

Training Details

Training Data

Audio: Large-scale Indian music corpus
Labels: 31 mood dimensions generated via a CLAP-based teacher model

Preprocessing

Resampling: 16 kHz
Normalization: Zero-mean, unit-variance
Chunking: Max 30 seconds per sample

Training Configuration

Optimizer: adamw_bnb_8bit
Learning Rate: 5e-5
Batch Size: 4 per device
Effective Batch Size: 16 (gradient accumulation)
Precision: FP16 mixed precision
Loss Function: Custom Weighted BCEWithLogitsLoss
Special Handling:
- Motivational (index 19) weight reduced to 0.3 due to high teacher variance

Architecture

Backbone: DistilHuBERT
Objective: Multi-label mood classification
Distillation: Teacher–student training for compactness and speed
Inference: ~90% of HuBERT performance at significantly reduced compute cost

Compute Infrastructure

Hardware

NVIDIA Tesla T4 (Google Colab)

Software

PyTorch
Hugging Face Transformers
Accelerate
BitsAndBytes (8-bit optimization)

Label Inventory (31 Classes)

Energetic Calm Happy Sad Angry Romantic Mysterious Nostalgic Dard-bhari Masti Sufi-romantic Item Song Qawwali Vibes Judaai Tapori Chill-lofi Hype Party Dreamy Dark Motivational Melancholic Intense Peaceful Experimental Ambient Spiritual Groovy Folk Indie Electronic Classical

Model Card Authors

beastLucifer

Downloads last month: 38

Safetensors

Model size

94.6M params

Tensor type

F32

Model tree for beastLucifer/ostrich-v1-audio-mood

Base model

ntu-spml/distilhubert

Finetuned

sandychoii/distilhubert-finetuned-gtzan-audio-classification