Comprehensive regression-ml-dl-genai-app

An interactive regression analysis platform powered by Machine Learning, Deep Learning, and Generative AI — built with Streamlit for real-time, explainable model exploration.

Project Summary

This project demonstrates a complete workflow for regression modeling using both traditional ML algorithms and Deep Learning architectures, enhanced by Generative AI (OpenAI GPT) to explain predictions and metrics in human-friendly language.

Designed as a modular, production-style app, this solution enables users to:

Upload and preprocess their own data
Select and compare regression models
Visualize model performance
Understand predictions with the help of a GenAI assistant

🎯 Key Features

Feature	Description
ML & DL Models	Train & evaluate Linear Regression, Random Forest, and Deep Neural Networks
Model Evaluation	Automatically computes metrics like RMSE, MAE, R², and plots
GenAI Explanation	Uses OpenAI GPT to generate plain-language descriptions of the models
Interactive UI	Built with Streamlit for easy interaction and dynamic insights
Data Upload	Accepts CSV data

Technologies Used

Scikit-learn: Linear Regression, Random Forest
TensorFlow / Keras: Deep Learning regression model
Pandas & Matplotlib: Data manipulation and visualization
Streamlit: Interactive web app
OpenAI GPT (via API): Model explanation using Generative AI

Getting Started

1. Clone the repository

git clone https://github.com/sivkri/regression-ml-dl-genai-app.git
cd regression-ml-dl-genai-app

2. Install the requirements

pip install -r requirements.txt

3. Get your OpenAI API key

Go to OpenAI API keys

you can input the key securely via the app if prompted.

4. Run the Streamlit app

streamlit run streamlit_app.py

How It Works

Upload CSV Data: Choose a CSV file with numeric features and a continuous target.

Choose Model: Select between ML (Linear Regression, Random Forest) or DL (Keras DNN).

Model Training: Train and evaluate the model.

Visualization: Plot actual vs predicted values, feature importance, etc.

Explain with GenAI: Generate human-friendly summaries and insights using GPT.

Sample Outputs

Scatter plots of predictions
Evaluation metrics (MAE, RMSE, R²)
GPT-generated explanations of model performance
Highlighted prediction confidence and anomalies

Generative AI Integration

The GenAI assistant uses the OpenAI API (GPT-3.5/4) to generate:

Explanation of model performance

Suggestions on data preprocessing

Interpretations of model predictions in natural language

⚖️ ML vs DL: Side-by-Side Comparison

🔍 Aspect	🧠 ML (XGBoost, RF, Stacking)	🤖 DL (Deep Neural Network)
Model Type	Ensemble of Decision Trees (XGBoost, RF) +Linear Stacking	Multi-layer feedforward neural network
Feature Engineering	Manual (RoomsPerPerson, LogPopulation)	Same manual features used
Outlier Handling	Z-score or IQR-based filtering	Same method
Feature Scaling	Optional (needed for linear models)	Required (for better convergence)
Hyperparameter Tuning	GridSearch / RandomSearchCV	Layer tuning + callbacks
Regularization	Tree constraints, early stopping	L2, Dropout, EarlyStopping
Training Control	Cross-validation, early stopping	EarlyStopping, ReduceLROnPlateau, Checkpoints
Explainability	✅ SHAP/LIME available for feature importance	❌ Harder (need external tools like LIME, tf-explain)
Training Speed	Fast on small/medium datasets	Slower due to many epochs
Scalability	Easy to scale (tree-based)	Great with GPU on large data
Interpretability	High (especially with SHAP)	Low unless explained manually
Overfitting Risk	Moderate (trees handle it better)	High without proper regularization
Custom Layers/Complexity	Less flexible (fixed structure)	Highly flexible (custom layers, losses, etc.)

When to Use What?

Use Case	✅ Choose ML (Tree-based models, etc.)	✅ Choose DL (Neural Networks)
Structured/tabular data	✅ Excellent performance	⚠️ Works, but often overkill
Need explainability	✅ SHAP, easily interpretable	❌ Requires extra tools like LIME/SHAP
Small to medium dataset	✅ Fast and efficient	⚠️ Risk of overfitting, slower training
Large-scale, complex dataset	⚠️ May not scale well	✅ Scales well with GPU
Unstructured data (images, text, audio)	❌ Not suitable	✅ Ideal choice
Quick prototyping	✅ Minimal tuning needed	❌ Needs architecture/hyperparameter tuning
Limited compute resources	✅ Lightweight models	❌ Needs more memory/time
Business-friendly interpretation needed	✅ High interpretability	❌ Black-box unless explained further
Want to ensemble or stack models	✅ Works very well	⚠️ Can be complex to ensemble

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
best_model		best_model
.gitignore		.gitignore
BostonHousing.csv		BostonHousing.csv
DockerFile.dockerfile		DockerFile.dockerfile
LICENSE		LICENSE
README.md		README.md
Regression_mlvsdl.ipynb		Regression_mlvsdl.ipynb
Regression_mlvsdl.py		Regression_mlvsdl.py
dl_model.h5		dl_model.h5
evaluation_metrics.csv		evaluation_metrics.csv
requirements.txt		requirements.txt
shap_values.csv		shap_values.csv
streamlit_app.py		streamlit_app.py
xgb_model.json		xgb_model.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comprehensive regression-ml-dl-genai-app

Project Summary

🎯 Key Features

Technologies Used

Getting Started

1. Clone the repository

2. Install the requirements

3. Get your OpenAI API key

4. Run the Streamlit app

How It Works

Sample Outputs

Generative AI Integration

⚖️ ML vs DL: Side-by-Side Comparison

When to Use What?

About

Releases

Packages

Languages

License

sivkri/regression-ml-dl-genai-app

Folders and files

Latest commit

History

Repository files navigation

Comprehensive regression-ml-dl-genai-app

Project Summary

🎯 Key Features

Technologies Used

Getting Started

1. Clone the repository

2. Install the requirements

3. Get your OpenAI API key

4. Run the Streamlit app

How It Works

Sample Outputs

Generative AI Integration

⚖️ ML vs DL: Side-by-Side Comparison

When to Use What?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages