Spaces:

Abhishek279
/

Video_Transcription_Analysis

Running

App Files Files Community

Video_Transcription_Analysis / README.md

Abhishek279

Update README.md

9ae474f verified 5 days ago

preview code

raw

history blame contribute delete

4.71 kB

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade

metadata

title: Video Transcription & Analysis
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.28.0
app_file: src/streamlit_app.py
pinned: false
license: mit

🎤 Video Transcription & Analysis

A powerful Streamlit application that transcribes speech from video files and performs comprehensive sentiment and emotion analysis using state-of-the-art pretrained models.

✨ Features

🎥 Video Upload - Support for multiple video formats (MP4, AVI, MOV, MKV, WebM, FLV, WMV, M4V)
🎤 Speech Transcription - Automatic speech-to-text using OpenAI Whisper
👥 Speaker Diarization - Automatically identifies and separates different speakers (Speaker 1, Speaker 2, etc.)
😊 Sentiment Analysis - Detects positive, negative, or neutral sentiment for each segment
🎭 Emotion Analysis - Identifies emotions: angry, sad, happy, neutral, surprise
📊 Visual Analytics - Interactive graphs showing sentiment and emotion distribution
📝 Detailed Reports - Per-segment analysis with timestamps and speaker labels
📥 Export Options - Download transcription and full analysis reports
🌍 Language Detection - Automatic language identification
⚡ Multiple Model Sizes - Choose from tiny, base, small, medium, or large Whisper models

🚀 How to Use

Upload a Video: Click "Browse files" and select your video file
Enable Speaker Diarization: Toggle the checkbox to identify different speakers (enabled by default)
Transcribe: Click "🎯 Transcribe Video" button
View Results:
- See speaker-separated transcription
- View sentiment and emotion analysis
- Explore interactive graphs
- Review detailed segment analysis

📊 Analysis Features

Sentiment Analysis

Positive 😊 - Happy, optimistic, or positive content
Negative 😞 - Sad, pessimistic, or negative content
Neutral 😐 - Balanced or neutral content

Emotion Analysis

Happy 😄 - Joyful and positive emotions
Sad 😢 - Sadness and melancholy
Angry 😠 - Anger and frustration
Neutral 😐 - Balanced emotional state
Surprise 😲 - Surprise and astonishment

Speaker Diarization

Automatically detects 2-5 speakers
Labels them sequentially as Speaker 1, Speaker 2, etc.
Separates transcription by speaker for easy reading

🛠️ Technical Details

Models Used

Transcription

OpenAI Whisper - State-of-the-art speech recognition
- Model sizes: tiny (39M), base (74M), small (244M), medium (769M), large (1550M)
- Recommended: base (good balance of speed and accuracy)

Sentiment Analysis

CardiffNLP Twitter-RoBERTa - Pretrained sentiment analysis model
- Detects: Positive, Negative, Neutral

Emotion Analysis

j-hartmann Emotion Model - Pretrained emotion detection model
- Detects: Joy, Sadness, Anger, Fear, Surprise, Disgust, Neutral

Speaker Diarization

Audio Feature Clustering - Uses MFCC features and hierarchical clustering
- No external API required
- Works entirely offline

📋 Requirements

Python 3.8+
FFmpeg (automatically installed in Hugging Face Spaces)
All Python dependencies are listed in requirements.txt

🎯 Use Cases

Content Analysis - Analyze video content for sentiment and emotions
Meeting Transcripts - Transcribe and analyze meetings with speaker identification
Video Editing - Get transcripts for video editing workflows
Accessibility - Create transcripts for accessibility purposes
Research - Analyze emotional patterns in video content
Content Moderation - Detect sentiment and emotions in user-generated content

📝 Notes

First run will download the models (one-time download, cached for future use)
Larger videos will take longer to process
The app automatically extracts audio from video files
All models are free and pretrained (no API keys required)
Processing happens entirely in your browser session

🔧 Model Selection

Choose the Whisper model size based on your needs:

tiny - Fastest, good for quick transcriptions
base - Recommended balance of speed and accuracy
small - Better accuracy for important content
medium - High accuracy for professional use
large - Best accuracy, slower processing

📄 License

MIT License - Feel free to use and modify for your projects!

🙏 Acknowledgments

OpenAI for the Whisper model
Hugging Face for hosting and model distribution
CardiffNLP for sentiment analysis model
j-hartmann for emotion detection model

Made with ❤️ using Streamlit and Hugging Face Spaces