Regression Project

Predicting Real Estate Market Price

This project aims to provide accurate and insightful predictions of property prices based on various features. By leveraging machine learning algorithms, the system analyzes historical property data, taking into account factors such as district, square footage, number of bedrooms, and other relevant attributes.

About the Dataset

The dataset used is about Milwaukee's (US state) Real Estate sales data from the year 2002-2022. It is obtained from Milwaukee's Open Data Portal. You can download the dataset from here.

Results:

ModelMetricValue
Linear RegressionMSE2,960,230,826
MAE34,375
R20.679
Random Forest RegressorMSE2,047,237,885
MAE28,953
R20.778
XGBoost RegressorMSE1,939,965,795
MAE28,216
R20.790
Fine-tuned XGBoost RegressorMSE1,785,590,381
MAE27,957
R20.806

    Check out my Github Repository for more info:

    View on GitHub
  • Select another project
  • Multi-class Classification Project

    Credit score prediction

    This project aims to build a predictive model capable of categorizing individuals into three distinct credit score classes: good, average, and bad. The project employs machine learning techniques to assess various financial and non-financial features, providing a nuanced evaluation of creditworthiness.

    About the Dataset

    The dataset was taken from here. It contains essential bank details and credit-related information. Notably, the dataset is not pristine and exhibits inconsistencies. Nonetheless, navigating and addressing these challenges provides a valuable learning experience in the process of cleaning and refinining raw datasets.

    Results:

    ModelMetricValue
    XGBoost ClassifierAccuracy0.776
    Precision0.775
    Recall0.776
    F1 Score0.774
    XGBoost Classifier (randomized search)Accuracy0.793
    Precision0.792
    Recall0.793
    F1 Score0.792
    Random Forest ClassifierAccuracy0.795
    Precision 0.795
    Recall 0.795
    F1 Score0.794

      Check out my Github Repository for more info:

      View on GitHub
    • Select another project
    • AutoML using PyCaret and Pandas-profiling

      What is PyCaret

      PyCaret is an open-source, low-code maching learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and Model management tool that exponentially speeds up experiment cycle and makes you more productive.

      What is Pandas-profiling

      Pandas-profiling is a Python library that generates exploratory data analysis (EDA) reports from Pandas DataFrames. It simplifies the process of understanding and visualizing the characteristics of a dataset.

      Goal of project

      The goal of this project is to create a straightforward AutoML app capable of rapidly assessing the performance of various ML algorithms on a dataset. This enables a streamlined model selection process. Additionally, I utilized Docker to containerize the app, facilitating efficient deployment and reproducibility.


      Docker Image

      • Pull command: docker pull ongaunjie1/automl-app:latest
      • Run command: docker run -d -p 8501:8501 ongaunjie1/automl-app:latest

      Future improvements for the app:

      - Add other features of PyCaret to the app, For example, fine-tuning, model evaluation, and etc. Referherefor the documentations of PyCaret.

      Test out my Streamlit app

      icon

      Note: The app created only uses PyCaret's default settings for data cleaning and data preprocessing. Hence, it is recommended that you clean/pre-process your data before performing modeling with the app. However, you can utilize pandas-profiling from the app to get an overview of the statistical insights from your dataset.


        Check out my Github Repository for more info:

        View on GitHub
      • Select another project
      • Chat Assistant

        Hi! How can I help you today?