An interactive regression analysis platform powered by Machine Learning, Deep Learning, and Generative AI — built with Streamlit for real-time, explainable model exploration.
This project demonstrates a complete workflow for regression modeling using both traditional ML algorithms and Deep Learning architectures, enhanced by Generative AI (OpenAI GPT) to explain predictions and metrics in human-friendly language.
Designed as a modular, production-style app, this solution enables users to:
- Upload and preprocess their own data
- Select and compare regression models
- Visualize model performance
- Understand predictions with the help of a GenAI assistant
Feature | Description |
---|---|
ML & DL Models | Train & evaluate Linear Regression, Random Forest, and Deep Neural Networks |
Model Evaluation | Automatically computes metrics like RMSE, MAE, R², and plots |
GenAI Explanation | Uses OpenAI GPT to generate plain-language descriptions of the models |
Interactive UI | Built with Streamlit for easy interaction and dynamic insights |
Data Upload | Accepts CSV data |
- Scikit-learn: Linear Regression, Random Forest
- TensorFlow / Keras: Deep Learning regression model
- Pandas & Matplotlib: Data manipulation and visualization
- Streamlit: Interactive web app
- OpenAI GPT (via API): Model explanation using Generative AI
git clone https://github.com/sivkri/regression-ml-dl-genai-app.git
cd regression-ml-dl-genai-app
pip install -r requirements.txt
Go to OpenAI API keys
you can input the key securely via the app if prompted.
streamlit run streamlit_app.py
Upload CSV Data: Choose a CSV file with numeric features and a continuous target.
Choose Model: Select between ML (Linear Regression, Random Forest) or DL (Keras DNN).
Model Training: Train and evaluate the model.
Visualization: Plot actual vs predicted values, feature importance, etc.
Explain with GenAI: Generate human-friendly summaries and insights using GPT.
-
Scatter plots of predictions
-
Evaluation metrics (MAE, RMSE, R²)
-
GPT-generated explanations of model performance
-
Highlighted prediction confidence and anomalies
The GenAI assistant uses the OpenAI API (GPT-3.5/4) to generate:
Explanation of model performance
Suggestions on data preprocessing
Interpretations of model predictions in natural language
🔍 Aspect | 🧠 ML (XGBoost, RF, Stacking) | 🤖 DL (Deep Neural Network) |
---|---|---|
Model Type | Ensemble of Decision Trees (XGBoost, RF) +Linear Stacking | Multi-layer feedforward neural network |
Feature Engineering | Manual (RoomsPerPerson, LogPopulation) | Same manual features used |
Outlier Handling | Z-score or IQR-based filtering | Same method |
Feature Scaling | Optional (needed for linear models) | Required (for better convergence) |
Hyperparameter Tuning | GridSearch / RandomSearchCV | Layer tuning + callbacks |
Regularization | Tree constraints, early stopping | L2, Dropout, EarlyStopping |
Training Control | Cross-validation, early stopping | EarlyStopping, ReduceLROnPlateau, Checkpoints |
Explainability | ✅ SHAP/LIME available for feature importance | ❌ Harder (need external tools like LIME, tf-explain) |
Training Speed | Fast on small/medium datasets | Slower due to many epochs |
Scalability | Easy to scale (tree-based) | Great with GPU on large data |
Interpretability | High (especially with SHAP) | Low unless explained manually |
Overfitting Risk | Moderate (trees handle it better) | High without proper regularization |
Custom Layers/Complexity | Less flexible (fixed structure) | Highly flexible (custom layers, losses, etc.) |
Use Case | ✅ Choose ML (Tree-based models, etc.) | ✅ Choose DL (Neural Networks) |
---|---|---|
Structured/tabular data | ✅ Excellent performance | |
Need explainability | ✅ SHAP, easily interpretable | ❌ Requires extra tools like LIME/SHAP |
Small to medium dataset | ✅ Fast and efficient | |
Large-scale, complex dataset | ✅ Scales well with GPU | |
Unstructured data (images, text, audio) | ❌ Not suitable | ✅ Ideal choice |
Quick prototyping | ✅ Minimal tuning needed | ❌ Needs architecture/hyperparameter tuning |
Limited compute resources | ✅ Lightweight models | ❌ Needs more memory/time |
Business-friendly interpretation needed | ✅ High interpretability | ❌ Black-box unless explained further |
Want to ensemble or stack models | ✅ Works very well |