--- title: Business Intelligence Dashboard emoji: ๐Ÿ“Š colorFrom: blue colorTo: purple sdk: gradio sdk_version: 3.50.2 app_file: app.py pinned: false --- # ๐Ÿ“Š Business Intelligence Dashboard A professional, interactive Business Intelligence dashboard built with Gradio that enables non-technical stakeholders to explore and analyze business data. ## ๐ŸŒŸ Features ### ๐Ÿ“‚ Data Management - **Pre-loaded Datasets**: Online Retail and Airbnb datasets included - **Custom Upload**: Support for CSV, Excel (.xlsx, .xls), JSON, and Parquet files (max 50MB) - **Automatic Data Cleaning**: Handles missing values, type conversions, and duplicate removal - **Data Validation**: Comprehensive error handling and user-friendly error messages ### ๐Ÿ“ˆ Statistics & Profiling - **Automated Data Profiling**: Get instant insights into your dataset - **Numerical Summary**: Mean, median, std deviation, quartiles, min/max - **Categorical Analysis**: Unique values, value counts, mode - **Missing Values Report**: Identify data quality issues - **Correlation Matrix**: Visual correlation heatmap for numerical features ### ๐Ÿ” Interactive Filtering - **Dynamic Filters**: Filter by numerical ranges, categorical values, or date ranges - **Real-time Updates**: See row counts update as you apply filters - **Multiple Filters**: Combine multiple filters for precise data exploration - **Filter Management**: Easy to add, view, and clear filters ### ๐Ÿ“‰ Smart Visualizations - **AI-Powered Recommendations**: Get intelligent visualization suggestions based on your data - **One-Click Creation**: Create recommended visualizations with a single click - **5 Visualization Types**: - Time Series Plots (with aggregation: sum, mean, count, median) - Distribution Plots (histogram, box plot) - Category Analysis (bar chart, pie chart) - Scatter Plots (with color coding and trend lines) - Correlation Heatmap - **Dual Backend**: Supports both Matplotlib and Plotly - **Customization**: Full control over columns, aggregations, and visual parameters ### ๐Ÿ’ก Automated Insights - **Top/Bottom Performers**: Identify highest and lowest values - **Trend Analysis**: Detect patterns over time with growth rate and volatility - **Anomaly Detection**: Find outliers using Z-score or IQR methods - **Distribution Analysis**: Understand data distributions with skewness and kurtosis - **Correlation Insights**: Discover strong relationships between variables ### ๐Ÿ’พ Export Capabilities - **Data Export**: Export filtered data as CSV or Excel - **Visualization Export**: Save charts as PNG images ## ๐Ÿ—๏ธ Architecture & Design ### SOLID Principles Implementation - **Single Responsibility**: Each class has one clear purpose - **Open/Closed**: Extensible through Strategy Pattern without modifying existing code - **Liskov Substitution**: All strategies are interchangeable - **Interface Segregation**: Specific interfaces for different operations - **Dependency Inversion**: Depends on abstractions, not concrete implementations ### Design Patterns - **Strategy Pattern**: Used for data loading, visualizations, and insights - **Facade Pattern**: DataProcessor provides simple interface to complex operations - **Factory Pattern**: Dynamic strategy selection based on file type ### Project Structure ``` Business-Intelligence-Dashboard/ โ”œโ”€โ”€ app.py # Main Gradio application with 6 tabs โ”œโ”€โ”€ data_processor.py # Data loading, cleaning, filtering (Strategy Pattern) โ”œโ”€โ”€ visualizations.py # Chart creation with multiple strategies โ”œโ”€โ”€ insights.py # Automated insight generation โ”œโ”€โ”€ utils.py # Utility functions and validators โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ README.md # This file โ”œโ”€โ”€ data/ # Sample datasets โ”‚ โ”œโ”€โ”€ Online_Retail.xlsx โ”‚ โ””โ”€โ”€ Airbnb.csv โ””โ”€โ”€ tests/ # Comprehensive test suite โ”œโ”€โ”€ init.py โ”œโ”€โ”€ conftest.py โ”œโ”€โ”€ test_utils.py โ”œโ”€โ”€ test_data_processor.py โ”œโ”€โ”€ test_visualizations.py โ””โ”€โ”€ test_insights.py ``` ## ๐Ÿš€ Getting Started ### Prerequisites - Python 3.8 or higher - pip package manager ### Installation 1. **Clone the repository** ```bash git clone https://github.com/CR1502/Business-Intelligence-Dashboard.git cd Business-Intelligence-Dashboard ``` 2. **Create a virtual environment** ```bash # On macOS/Linux python3 -m venv venv source venv/bin/activate # On Windows python -m venv venv venv\Scripts\activate ``` 3. **Install dependencies** ```bash pip install -r requirements.txt ``` 4. **Run the application** ```bash python app.py ``` The dashboard will launch and open in your default browser at `http://localhost:7860` ## ๐Ÿ“– Usage Guide ### 1. Loading Data - **Option A**: Select "Online Retail" or "Airbnb" from the dropdown - **Option B**: Upload your own dataset (CSV, Excel, JSON, or Parquet) ### 2. Exploring Statistics - Navigate to "Statistics & Profiling" tab - Click "Generate Data Profile" to see comprehensive statistics - View missing values, numerical summaries, and correlation matrix ### 3. Filtering Data - Go to "Filter & Explore" tab - Select filter type (Numerical, Categorical, or Date) - Choose column and set filter criteria - Click "Add Filter" and see real-time updates ### 4. Creating Visualizations - Navigate to "Visualizations" tab - **Smart Recommendations**: Click "Get Visualization Recommendations" for AI-powered suggestions - **Custom Visualizations**: Select visualization type and configure parameters - Supported charts: Time Series, Distribution, Category, Scatter, Correlation ### 5. Generating Insights - Go to "Insights" tab - Click "Generate All Insights" for automated analysis - Or select specific insight type for targeted analysis ### 6. Exporting Results - Navigate to "Export" tab - Choose format (CSV or Excel) - Click "Export Data" to download filtered dataset ## ๐Ÿงช Testing Run the comprehensive test suite: ```bash # Run all tests pytest tests/ -v # Run specific test file pytest tests/test_utils.py -v # Run with coverage pytest tests/ --cov=. --cov-report=html ``` Test coverage includes: - **180+ test cases** across all modules - Unit tests for all functions and classes - Strategy Pattern implementation tests - Edge case and error handling tests ## ๐Ÿ› ๏ธ Technologies Used - **Gradio**: Web interface and interactive components - **Pandas**: Data manipulation and analysis - **NumPy**: Numerical computations - **Matplotlib/Seaborn**: Static visualizations - **Plotly**: Interactive visualizations - **Python 3.10+**: Core programming language ## ๐Ÿ“Š Sample Datasets ### Online Retail Dataset - **8 columns**: InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, Country - **Use case**: E-commerce sales analysis, product trends, customer analysis ### Airbnb Dataset - **26 columns**: Including price, location, room type, reviews, availability - **Use case**: Pricing analysis, location trends, booking patterns ## ๐Ÿค Contributing Contributions are welcome! Please follow these steps: 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/AmazingFeature`) 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) 4. Push to the branch (`git push origin feature/AmazingFeature`) 5. Open a Pull Request ### Development Guidelines - Follow PEP 8 style guidelines - Add docstrings to all functions - Include unit tests for new features - Update README.md for significant changes ## ๐Ÿ‘จโ€๐Ÿ’ป Author **Craig Roberts** ## ๐Ÿ™ Acknowledgments - Northeastern University - CS5130 Course (Prof Lino) - Dataset sources: UCI ML Repository, Kaggle ## โšก Performance Notes - Handles datasets up to 50MB efficiently - Optimized for 1,000-10,000 rows - Tested with datasets containing 100+ columns - Real-time filtering with sub-second response times ## ๐Ÿ› Known Issues - Large datasets (>100MB) may cause memory issues - Some complex visualizations may take time to render - Browser storage not available (by design for security) ---