Abhishek279's picture
Update README.md
9ae474f verified

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade
metadata
title: Video Transcription & Analysis
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.28.0
app_file: src/streamlit_app.py
pinned: false
license: mit

🎀 Video Transcription & Analysis

A powerful Streamlit application that transcribes speech from video files and performs comprehensive sentiment and emotion analysis using state-of-the-art pretrained models.

✨ Features

  • πŸŽ₯ Video Upload - Support for multiple video formats (MP4, AVI, MOV, MKV, WebM, FLV, WMV, M4V)
  • 🎀 Speech Transcription - Automatic speech-to-text using OpenAI Whisper
  • πŸ‘₯ Speaker Diarization - Automatically identifies and separates different speakers (Speaker 1, Speaker 2, etc.)
  • 😊 Sentiment Analysis - Detects positive, negative, or neutral sentiment for each segment
  • 🎭 Emotion Analysis - Identifies emotions: angry, sad, happy, neutral, surprise
  • πŸ“Š Visual Analytics - Interactive graphs showing sentiment and emotion distribution
  • πŸ“ Detailed Reports - Per-segment analysis with timestamps and speaker labels
  • πŸ“₯ Export Options - Download transcription and full analysis reports
  • 🌍 Language Detection - Automatic language identification
  • ⚑ Multiple Model Sizes - Choose from tiny, base, small, medium, or large Whisper models

πŸš€ How to Use

  1. Upload a Video: Click "Browse files" and select your video file
  2. Enable Speaker Diarization: Toggle the checkbox to identify different speakers (enabled by default)
  3. Transcribe: Click "🎯 Transcribe Video" button
  4. View Results:
    • See speaker-separated transcription
    • View sentiment and emotion analysis
    • Explore interactive graphs
    • Review detailed segment analysis

πŸ“Š Analysis Features

Sentiment Analysis

  • Positive 😊 - Happy, optimistic, or positive content
  • Negative 😞 - Sad, pessimistic, or negative content
  • Neutral 😐 - Balanced or neutral content

Emotion Analysis

  • Happy πŸ˜„ - Joyful and positive emotions
  • Sad 😒 - Sadness and melancholy
  • Angry 😠 - Anger and frustration
  • Neutral 😐 - Balanced emotional state
  • Surprise 😲 - Surprise and astonishment

Speaker Diarization

  • Automatically detects 2-5 speakers
  • Labels them sequentially as Speaker 1, Speaker 2, etc.
  • Separates transcription by speaker for easy reading

πŸ› οΈ Technical Details

Models Used

Transcription

  • OpenAI Whisper - State-of-the-art speech recognition
    • Model sizes: tiny (39M), base (74M), small (244M), medium (769M), large (1550M)
    • Recommended: base (good balance of speed and accuracy)

Sentiment Analysis

  • CardiffNLP Twitter-RoBERTa - Pretrained sentiment analysis model
    • Detects: Positive, Negative, Neutral

Emotion Analysis

  • j-hartmann Emotion Model - Pretrained emotion detection model
    • Detects: Joy, Sadness, Anger, Fear, Surprise, Disgust, Neutral

Speaker Diarization

  • Audio Feature Clustering - Uses MFCC features and hierarchical clustering
    • No external API required
    • Works entirely offline

πŸ“‹ Requirements

  • Python 3.8+
  • FFmpeg (automatically installed in Hugging Face Spaces)
  • All Python dependencies are listed in requirements.txt

🎯 Use Cases

  • Content Analysis - Analyze video content for sentiment and emotions
  • Meeting Transcripts - Transcribe and analyze meetings with speaker identification
  • Video Editing - Get transcripts for video editing workflows
  • Accessibility - Create transcripts for accessibility purposes
  • Research - Analyze emotional patterns in video content
  • Content Moderation - Detect sentiment and emotions in user-generated content

πŸ“ Notes

  • First run will download the models (one-time download, cached for future use)
  • Larger videos will take longer to process
  • The app automatically extracts audio from video files
  • All models are free and pretrained (no API keys required)
  • Processing happens entirely in your browser session

πŸ”§ Model Selection

Choose the Whisper model size based on your needs:

  • tiny - Fastest, good for quick transcriptions
  • base - Recommended balance of speed and accuracy
  • small - Better accuracy for important content
  • medium - High accuracy for professional use
  • large - Best accuracy, slower processing

πŸ“„ License

MIT License - Feel free to use and modify for your projects!

πŸ™ Acknowledgments

  • OpenAI for the Whisper model
  • Hugging Face for hosting and model distribution
  • CardiffNLP for sentiment analysis model
  • j-hartmann for emotion detection model

Made with ❀️ using Streamlit and Hugging Face Spaces