A newer version of the Streamlit SDK is available:
1.52.1
metadata
title: Video Transcription & Analysis
emoji: π€
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.28.0
app_file: src/streamlit_app.py
pinned: false
license: mit
π€ Video Transcription & Analysis
A powerful Streamlit application that transcribes speech from video files and performs comprehensive sentiment and emotion analysis using state-of-the-art pretrained models.
β¨ Features
- π₯ Video Upload - Support for multiple video formats (MP4, AVI, MOV, MKV, WebM, FLV, WMV, M4V)
- π€ Speech Transcription - Automatic speech-to-text using OpenAI Whisper
- π₯ Speaker Diarization - Automatically identifies and separates different speakers (Speaker 1, Speaker 2, etc.)
- π Sentiment Analysis - Detects positive, negative, or neutral sentiment for each segment
- π Emotion Analysis - Identifies emotions: angry, sad, happy, neutral, surprise
- π Visual Analytics - Interactive graphs showing sentiment and emotion distribution
- π Detailed Reports - Per-segment analysis with timestamps and speaker labels
- π₯ Export Options - Download transcription and full analysis reports
- π Language Detection - Automatic language identification
- β‘ Multiple Model Sizes - Choose from tiny, base, small, medium, or large Whisper models
π How to Use
- Upload a Video: Click "Browse files" and select your video file
- Enable Speaker Diarization: Toggle the checkbox to identify different speakers (enabled by default)
- Transcribe: Click "π― Transcribe Video" button
- View Results:
- See speaker-separated transcription
- View sentiment and emotion analysis
- Explore interactive graphs
- Review detailed segment analysis
π Analysis Features
Sentiment Analysis
- Positive π - Happy, optimistic, or positive content
- Negative π - Sad, pessimistic, or negative content
- Neutral π - Balanced or neutral content
Emotion Analysis
- Happy π - Joyful and positive emotions
- Sad π’ - Sadness and melancholy
- Angry π - Anger and frustration
- Neutral π - Balanced emotional state
- Surprise π² - Surprise and astonishment
Speaker Diarization
- Automatically detects 2-5 speakers
- Labels them sequentially as Speaker 1, Speaker 2, etc.
- Separates transcription by speaker for easy reading
π οΈ Technical Details
Models Used
Transcription
- OpenAI Whisper - State-of-the-art speech recognition
- Model sizes: tiny (39M), base (74M), small (244M), medium (769M), large (1550M)
- Recommended: base (good balance of speed and accuracy)
Sentiment Analysis
- CardiffNLP Twitter-RoBERTa - Pretrained sentiment analysis model
- Detects: Positive, Negative, Neutral
Emotion Analysis
- j-hartmann Emotion Model - Pretrained emotion detection model
- Detects: Joy, Sadness, Anger, Fear, Surprise, Disgust, Neutral
Speaker Diarization
- Audio Feature Clustering - Uses MFCC features and hierarchical clustering
- No external API required
- Works entirely offline
π Requirements
- Python 3.8+
- FFmpeg (automatically installed in Hugging Face Spaces)
- All Python dependencies are listed in
requirements.txt
π― Use Cases
- Content Analysis - Analyze video content for sentiment and emotions
- Meeting Transcripts - Transcribe and analyze meetings with speaker identification
- Video Editing - Get transcripts for video editing workflows
- Accessibility - Create transcripts for accessibility purposes
- Research - Analyze emotional patterns in video content
- Content Moderation - Detect sentiment and emotions in user-generated content
π Notes
- First run will download the models (one-time download, cached for future use)
- Larger videos will take longer to process
- The app automatically extracts audio from video files
- All models are free and pretrained (no API keys required)
- Processing happens entirely in your browser session
π§ Model Selection
Choose the Whisper model size based on your needs:
- tiny - Fastest, good for quick transcriptions
- base - Recommended balance of speed and accuracy
- small - Better accuracy for important content
- medium - High accuracy for professional use
- large - Best accuracy, slower processing
π License
MIT License - Feel free to use and modify for your projects!
π Acknowledgments
- OpenAI for the Whisper model
- Hugging Face for hosting and model distribution
- CardiffNLP for sentiment analysis model
- j-hartmann for emotion detection model
Made with β€οΈ using Streamlit and Hugging Face Spaces