File size: 8,209 Bytes
ab89b14
 
 
 
 
 
c36d938
ab89b14
 
 
 
c51e926
 
ab89b14
c51e926
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a9f05e7
 
 
 
 
 
c51e926
 
 
 
 
 
 
 
 
 
 
de5693d
c51e926
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
---
title: Business Intelligence Dashboard
emoji: πŸ“Š
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false
---

# πŸ“Š Business Intelligence Dashboard

A professional, interactive Business Intelligence dashboard built with Gradio that enables non-technical stakeholders to explore and analyze business data.

## 🌟 Features

### πŸ“‚ Data Management
- **Pre-loaded Datasets**: Online Retail and Airbnb datasets included
- **Custom Upload**: Support for CSV, Excel (.xlsx, .xls), JSON, and Parquet files (max 50MB)
- **Automatic Data Cleaning**: Handles missing values, type conversions, and duplicate removal
- **Data Validation**: Comprehensive error handling and user-friendly error messages

### πŸ“ˆ Statistics & Profiling
- **Automated Data Profiling**: Get instant insights into your dataset
- **Numerical Summary**: Mean, median, std deviation, quartiles, min/max
- **Categorical Analysis**: Unique values, value counts, mode
- **Missing Values Report**: Identify data quality issues
- **Correlation Matrix**: Visual correlation heatmap for numerical features

### πŸ” Interactive Filtering
- **Dynamic Filters**: Filter by numerical ranges, categorical values, or date ranges
- **Real-time Updates**: See row counts update as you apply filters
- **Multiple Filters**: Combine multiple filters for precise data exploration
- **Filter Management**: Easy to add, view, and clear filters

### πŸ“‰ Smart Visualizations
- **AI-Powered Recommendations**: Get intelligent visualization suggestions based on your data
- **One-Click Creation**: Create recommended visualizations with a single click
- **5 Visualization Types**:
  - Time Series Plots (with aggregation: sum, mean, count, median)
  - Distribution Plots (histogram, box plot)
  - Category Analysis (bar chart, pie chart)
  - Scatter Plots (with color coding and trend lines)
  - Correlation Heatmap
- **Dual Backend**: Supports both Matplotlib and Plotly
- **Customization**: Full control over columns, aggregations, and visual parameters

### πŸ’‘ Automated Insights
- **Top/Bottom Performers**: Identify highest and lowest values
- **Trend Analysis**: Detect patterns over time with growth rate and volatility
- **Anomaly Detection**: Find outliers using Z-score or IQR methods
- **Distribution Analysis**: Understand data distributions with skewness and kurtosis
- **Correlation Insights**: Discover strong relationships between variables

### πŸ’Ύ Export Capabilities
- **Data Export**: Export filtered data as CSV or Excel
- **Visualization Export**: Save charts as PNG images

## πŸ—οΈ Architecture & Design

### SOLID Principles Implementation
- **Single Responsibility**: Each class has one clear purpose
- **Open/Closed**: Extensible through Strategy Pattern without modifying existing code
- **Liskov Substitution**: All strategies are interchangeable
- **Interface Segregation**: Specific interfaces for different operations
- **Dependency Inversion**: Depends on abstractions, not concrete implementations

### Design Patterns
- **Strategy Pattern**: Used for data loading, visualizations, and insights
- **Facade Pattern**: DataProcessor provides simple interface to complex operations
- **Factory Pattern**: Dynamic strategy selection based on file type

### Project Structure
```
Business-Intelligence-Dashboard/
β”œβ”€β”€ app.py                      # Main Gradio application with 6 tabs
β”œβ”€β”€ data_processor.py           # Data loading, cleaning, filtering (Strategy Pattern)
β”œβ”€β”€ visualizations.py           # Chart creation with multiple strategies
β”œβ”€β”€ insights.py                 # Automated insight generation
β”œβ”€β”€ utils.py                    # Utility functions and validators
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ README.md                   # This file
β”œβ”€β”€ data/                       # Sample datasets
β”‚   β”œβ”€β”€ Online_Retail.xlsx
β”‚   └── Airbnb.csv
└── tests/                      # Comprehensive test suite
    β”œβ”€β”€ init.py
    β”œβ”€β”€ conftest.py
    β”œβ”€β”€ test_utils.py
    β”œβ”€β”€ test_data_processor.py
    β”œβ”€β”€ test_visualizations.py
    └── test_insights.py
```
## πŸš€ Getting Started

### Prerequisites
- Python 3.8 or higher
- pip package manager

### Installation

1. **Clone the repository**
```bash
git clone https://github.com/CR1502/Business-Intelligence-Dashboard.git
cd Business-Intelligence-Dashboard
```

2. **Create a virtual environment**
```bash
# On macOS/Linux
python3 -m venv venv
source venv/bin/activate

# On Windows
python -m venv venv
venv\Scripts\activate
```

3. **Install dependencies**
```bash
pip install -r requirements.txt
```

4. **Run the application**
```bash
python app.py
```

The dashboard will launch and open in your default browser at `http://localhost:7860`

## πŸ“– Usage Guide

### 1. Loading Data
- **Option A**: Select "Online Retail" or "Airbnb" from the dropdown
- **Option B**: Upload your own dataset (CSV, Excel, JSON, or Parquet)

### 2. Exploring Statistics
- Navigate to "Statistics & Profiling" tab
- Click "Generate Data Profile" to see comprehensive statistics
- View missing values, numerical summaries, and correlation matrix

### 3. Filtering Data
- Go to "Filter & Explore" tab
- Select filter type (Numerical, Categorical, or Date)
- Choose column and set filter criteria
- Click "Add Filter" and see real-time updates

### 4. Creating Visualizations
- Navigate to "Visualizations" tab
- **Smart Recommendations**: Click "Get Visualization Recommendations" for AI-powered suggestions
- **Custom Visualizations**: Select visualization type and configure parameters
- Supported charts: Time Series, Distribution, Category, Scatter, Correlation

### 5. Generating Insights
- Go to "Insights" tab
- Click "Generate All Insights" for automated analysis
- Or select specific insight type for targeted analysis

### 6. Exporting Results
- Navigate to "Export" tab
- Choose format (CSV or Excel)
- Click "Export Data" to download filtered dataset

## πŸ§ͺ Testing

Run the comprehensive test suite:
```bash
# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_utils.py -v

# Run with coverage
pytest tests/ --cov=. --cov-report=html
```

Test coverage includes:
- **180+ test cases** across all modules
- Unit tests for all functions and classes
- Strategy Pattern implementation tests
- Edge case and error handling tests

## πŸ› οΈ Technologies Used

- **Gradio**: Web interface and interactive components
- **Pandas**: Data manipulation and analysis
- **NumPy**: Numerical computations
- **Matplotlib/Seaborn**: Static visualizations
- **Plotly**: Interactive visualizations
- **Python 3.10+**: Core programming language

## πŸ“Š Sample Datasets

### Online Retail Dataset
- **8 columns**: InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, Country
- **Use case**: E-commerce sales analysis, product trends, customer analysis

### Airbnb Dataset
- **26 columns**: Including price, location, room type, reviews, availability
- **Use case**: Pricing analysis, location trends, booking patterns

## 🀝 Contributing

Contributions are welcome! Please follow these steps:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

### Development Guidelines
- Follow PEP 8 style guidelines
- Add docstrings to all functions
- Include unit tests for new features
- Update README.md for significant changes


## πŸ‘¨β€πŸ’» Author

**Craig Roberts**


## πŸ™ Acknowledgments

- Northeastern University - CS5130 Course (Prof Lino)
- Dataset sources: UCI ML Repository, Kaggle

## ⚑ Performance Notes

- Handles datasets up to 50MB efficiently
- Optimized for 1,000-10,000 rows
- Tested with datasets containing 100+ columns
- Real-time filtering with sub-second response times

## πŸ› Known Issues

- Large datasets (>100MB) may cause memory issues
- Some complex visualizations may take time to render
- Browser storage not available (by design for security)
---