Regression Project
This project aims to provide accurate and insightful predictions of property prices based on various features. By leveraging machine learning algorithms, the system analyzes historical property data, taking into account factors such as district, square footage, number of bedrooms, and other relevant attributes.
The dataset used is about Milwaukee's (US state) Real Estate sales data from the year 2002-2022. It is obtained from Milwaukee's Open Data Portal. You can download the dataset from here.
Model | Metric | Value |
---|---|---|
Linear Regression | MSE | 2,960,230,826 |
MAE | 34,375 | |
R2 | 0.679 | |
Random Forest Regressor | MSE | 2,047,237,885 |
MAE | 28,953 | |
R2 | 0.778 | |
XGBoost Regressor | MSE | 1,939,965,795 |
MAE | 28,216 | |
R2 | 0.790 | |
Fine-tuned XGBoost Regressor | MSE | 1,785,590,381 |
MAE | 27,957 | |
R2 | 0.806 |
Multi-class Classification Project
This project aims to build a predictive model capable of categorizing individuals into three distinct credit score classes: good, average, and bad. The project employs machine learning techniques to assess various financial and non-financial features, providing a nuanced evaluation of creditworthiness.
The dataset was taken from here. It contains essential bank details and credit-related information. Notably, the dataset is not pristine and exhibits inconsistencies. Nonetheless, navigating and addressing these challenges provides a valuable learning experience in the process of cleaning and refinining raw datasets.
Model | Metric | Value |
---|---|---|
XGBoost Classifier | Accuracy | 0.776 |
Precision | 0.775 | |
Recall | 0.776 | |
F1 Score | 0.774 | |
XGBoost Classifier (randomized search) | Accuracy | 0.793 |
Precision | 0.792 | |
Recall | 0.793 | |
F1 Score | 0.792 | |
Random Forest Classifier | Accuracy | 0.795 |
Precision | 0.795 | |
Recall | 0.795 | |
F1 Score | 0.794 |
AutoML using PyCaret and Pandas-profiling
PyCaret is an open-source, low-code maching learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and Model management tool that exponentially speeds up experiment cycle and makes you more productive.
Pandas-profiling is a Python library that generates exploratory data analysis (EDA) reports from Pandas DataFrames. It simplifies the process of understanding and visualizing the characteristics of a dataset.
The goal of this project is to create a straightforward AutoML app capable of rapidly assessing the performance of various ML algorithms on a dataset. This enables a streamlined model selection process. Additionally, I utilized Docker to containerize the app, facilitating efficient deployment and reproducibility.
Docker Image
Future improvements for the app:
- Add other features of PyCaret to the app, For example, fine-tuning, model evaluation, and etc. Referherefor the documentations of PyCaret.
Note: The app created only uses PyCaret's default settings for data cleaning and data preprocessing. Hence, it is recommended that you clean/pre-process your data before performing modeling with the app. However, you can utilize pandas-profiling from the app to get an overview of the statistical insights from your dataset.
Hi! How can I help you today?