Skip to content

An interactive regression analysis platform powered by Machine Learning, Deep Learning, and Generative AI — built with Streamlit for real-time, explainable model exploration.

License

Notifications You must be signed in to change notification settings

sivkri/regression-ml-dl-genai-app

Repository files navigation

Comprehensive regression-ml-dl-genai-app

An interactive regression analysis platform powered by Machine Learning, Deep Learning, and Generative AI — built with Streamlit for real-time, explainable model exploration.


Project Summary

This project demonstrates a complete workflow for regression modeling using both traditional ML algorithms and Deep Learning architectures, enhanced by Generative AI (OpenAI GPT) to explain predictions and metrics in human-friendly language.

Designed as a modular, production-style app, this solution enables users to:

  • Upload and preprocess their own data
  • Select and compare regression models
  • Visualize model performance
  • Understand predictions with the help of a GenAI assistant

🎯 Key Features

Feature Description
ML & DL Models Train & evaluate Linear Regression, Random Forest, and Deep Neural Networks
Model Evaluation Automatically computes metrics like RMSE, MAE, R², and plots
GenAI Explanation Uses OpenAI GPT to generate plain-language descriptions of the models
Interactive UI Built with Streamlit for easy interaction and dynamic insights
Data Upload Accepts CSV data

Technologies Used

  • Scikit-learn: Linear Regression, Random Forest
  • TensorFlow / Keras: Deep Learning regression model
  • Pandas & Matplotlib: Data manipulation and visualization
  • Streamlit: Interactive web app
  • OpenAI GPT (via API): Model explanation using Generative AI

Getting Started

1. Clone the repository

git clone https://github.com/sivkri/regression-ml-dl-genai-app.git
cd regression-ml-dl-genai-app

2. Install the requirements

pip install -r requirements.txt

3. Get your OpenAI API key

Go to OpenAI API keys

you can input the key securely via the app if prompted.

4. Run the Streamlit app

streamlit run streamlit_app.py

How It Works

Upload CSV Data: Choose a CSV file with numeric features and a continuous target.

Choose Model: Select between ML (Linear Regression, Random Forest) or DL (Keras DNN).

Model Training: Train and evaluate the model.

Visualization: Plot actual vs predicted values, feature importance, etc.

Explain with GenAI: Generate human-friendly summaries and insights using GPT.

Sample Outputs

  • Scatter plots of predictions

  • Evaluation metrics (MAE, RMSE, R²)

  • GPT-generated explanations of model performance

  • Highlighted prediction confidence and anomalies

Generative AI Integration

The GenAI assistant uses the OpenAI API (GPT-3.5/4) to generate:

Explanation of model performance

Suggestions on data preprocessing

Interpretations of model predictions in natural language


⚖️ ML vs DL: Side-by-Side Comparison

🔍 Aspect 🧠 ML (XGBoost, RF, Stacking) 🤖 DL (Deep Neural Network)
Model Type Ensemble of Decision Trees (XGBoost, RF) +Linear Stacking Multi-layer feedforward neural network
Feature Engineering Manual (RoomsPerPerson, LogPopulation) Same manual features used
Outlier Handling Z-score or IQR-based filtering Same method
Feature Scaling Optional (needed for linear models) Required (for better convergence)
Hyperparameter Tuning GridSearch / RandomSearchCV Layer tuning + callbacks
Regularization Tree constraints, early stopping L2, Dropout, EarlyStopping
Training Control Cross-validation, early stopping EarlyStopping, ReduceLROnPlateau, Checkpoints
Explainability ✅ SHAP/LIME available for feature importance ❌ Harder (need external tools like LIME, tf-explain)
Training Speed Fast on small/medium datasets Slower due to many epochs
Scalability Easy to scale (tree-based) Great with GPU on large data
Interpretability High (especially with SHAP) Low unless explained manually
Overfitting Risk Moderate (trees handle it better) High without proper regularization
Custom Layers/Complexity Less flexible (fixed structure) Highly flexible (custom layers, losses, etc.)

When to Use What?

Use Case ✅ Choose ML (Tree-based models, etc.) ✅ Choose DL (Neural Networks)
Structured/tabular data ✅ Excellent performance ⚠️ Works, but often overkill
Need explainability ✅ SHAP, easily interpretable ❌ Requires extra tools like LIME/SHAP
Small to medium dataset ✅ Fast and efficient ⚠️ Risk of overfitting, slower training
Large-scale, complex dataset ⚠️ May not scale well ✅ Scales well with GPU
Unstructured data (images, text, audio) ❌ Not suitable ✅ Ideal choice
Quick prototyping ✅ Minimal tuning needed ❌ Needs architecture/hyperparameter tuning
Limited compute resources ✅ Lightweight models ❌ Needs more memory/time
Business-friendly interpretation needed ✅ High interpretability ❌ Black-box unless explained further
Want to ensemble or stack models ✅ Works very well ⚠️ Can be complex to ensemble

About

An interactive regression analysis platform powered by Machine Learning, Deep Learning, and Generative AI — built with Streamlit for real-time, explainable model exploration.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages