Ong Aun Jie - Data Scientist Portfolio

Regression project
Classification project
AutoML project

← Back to Projects menu

Regression Project

Predicting Real Estate Market Price

This project aims to provide accurate and insightful predictions of property prices based on various features. By leveraging machine learning algorithms, the system analyzes historical property data, taking into account factors such as district, square footage, number of bedrooms, and other relevant attributes.

About the Dataset

The dataset used is about Milwaukee's (US state) Real Estate sales data from the year 2002-2022. It is obtained from Milwaukee's Open Data Portal. You can download the dataset from here.

Test out my Streamlit app

Results:

Model	Metric	Value
Linear Regression	MSE	2,960,230,826
	MAE	34,375
	R2	0.679
Random Forest Regressor	MSE	2,047,237,885
	MAE	28,953
	R2	0.778
XGBoost Regressor	MSE	1,939,965,795
	MAE	28,216
	R2	0.790
Fine-tuned XGBoost Regressor	MSE	1,785,590,381
	MAE	27,957
	R2	0.806

Check out my Github Repository for more info:

View on GitHub

Select another project

← Back to Projects

Multi-class Classification Project

Credit score prediction

This project aims to build a predictive model capable of categorizing individuals into three distinct credit score classes: good, average, and bad. The project employs machine learning techniques to assess various financial and non-financial features, providing a nuanced evaluation of creditworthiness.

About the Dataset

The dataset was taken from here. It contains essential bank details and credit-related information. Notably, the dataset is not pristine and exhibits inconsistencies. Nonetheless, navigating and addressing these challenges provides a valuable learning experience in the process of cleaning and refinining raw datasets.

Test out my Streamlit app

Results:

Model	Metric	Value
XGBoost Classifier	Accuracy	0.776
	Precision	0.775
	Recall	0.776
	F1 Score	0.774
XGBoost Classifier (randomized search)	Accuracy	0.793
	Precision	0.792
	Recall	0.793
	F1 Score	0.792
Random Forest Classifier	Accuracy	0.795
	Precision	0.795
	Recall	0.795
	F1 Score	0.794

Check out my Github Repository for more info:

View on GitHub

Select another project

← Back to Projects

AutoML using PyCaret and Pandas-profiling

What is PyCaret

PyCaret is an open-source, low-code maching learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and Model management tool that exponentially speeds up experiment cycle and makes you more productive.

What is Pandas-profiling

Pandas-profiling is a Python library that generates exploratory data analysis (EDA) reports from Pandas DataFrames. It simplifies the process of understanding and visualizing the characteristics of a dataset.

Goal of project

The goal of this project is to create a straightforward AutoML app capable of rapidly assessing the performance of various ML algorithms on a dataset. This enables a streamlined model selection process. Additionally, I utilized Docker to containerize the app, facilitating efficient deployment and reproducibility.

Docker Image

Pull command: docker pull ongaunjie1/automl-app:latest
Run command: docker run -d -p 8501:8501 ongaunjie1/automl-app:latest

Future improvements for the app:

- Add other features of PyCaret to the app, For example, fine-tuning, model evaluation, and etc. Referherefor the documentations of PyCaret.

Test out my Streamlit app

Note: The app created only uses PyCaret's default settings for data cleaning and data preprocessing. Hence, it is recommended that you clean/pre-process your data before performing modeling with the app. However, you can utilize pandas-profiling from the app to get an overview of the statistical insights from your dataset.

Check out my Github Repository for more info:

View on GitHub

Select another project

← Back to Projects menu

Predicting Real Estate Market Price

About the Dataset

Test out my Streamlit app

Results:

Check out my Github Repository for more info:

Credit score prediction

About the Dataset

Test out my Streamlit app

Results:

Check out my Github Repository for more info:

What is PyCaret

What is Pandas-profiling

Goal of project

Test out my Streamlit app

Check out my Github Repository for more info:

Chat Assistant