Housing Price Prediction Overview

Project Overview

This project is part of the CSCI323 - Modern Artificial Intelligence module at SIM-UOW. It focuses on predicting resale prices of HDB flats in Singapore using supervised machine learning techniques. Leveraging over 200,000 structured HDB resale records from January 2017 to May 2025, we analyzed features such as town, flat type, floor area, lease details, and transaction dates to uncover pricing trends and key influencing factors.

We implemented and evaluated four models — XGBoost, Random Forest, Multilayer Perceptron (MLP), and Skforecast. Among them, XGBoost achieved the most accurate results, especially after applying feature engineering with external economic and time-based data. This project highlights real-world challenges such as data imbalance across towns and the absence of geospatial features. Our findings suggest future improvements like ensemble learning and spatial data integration to enhance prediction accuracy in underrepresented regions.

Objective

The goal is to create predictive models that provide insights into resale pricing trends and help buyers, sellers, platforms, and policymakers make better decisions.

Models & Tools Used

Data Sources

Preprocessing Overview

To prepare over 200,000 HDB resale records for modeling, we applied several preprocessing steps:

These steps helped improve model accuracy and ensured fair comparison across XGBoost, Random Forest, and MLP models.

Model Evaluation Summary

Below are the model performances (with and without feature engineering):

ModelFeature EngRMSEMAE
XGBoost0.917657286.1542405.33
Random Forest0.854675095.0355121.53
MLP0.868372411.0351694.62

Result Visualizations

RMSE & MAE by Flat Type

This bar chart compares RMSE and MAE across flat types using XGBoost, Random Forest, and MLP models. It shows that XGBoost generally provides lower errors, especially in 3-Room to 5-Room flats.

RMSE and MAE by Flat Type

Actual vs Predicted Price – Best Towns

This line plot shows the prediction results for the best-performing towns (e.g., Choa Chu Kang, Jurong East, Geylang). The model predictions are close to actual prices, especially for 2-Room flats.

Best Towns Prediction

Actual vs Predicted Price – Worst Towns

This plot illustrates prediction challenges in higher-priced areas like Bukit Timah and Queenstown. The gap between actual and predicted values highlights model limitations in luxury or low-volume regions.

Worst Towns Prediction

Overall Resale Price Trend (Jan–May 2025)

This graph shows how resale prices evolve over time and how well the models capture these changes. XGBoost maintains the closest fit to actual price trends across months.

Time-based RMSE/MAE

Regression Fit Comparison

These scatter plots compare predicted vs actual resale prices. XGBoost again shows the tightest clustering along the ideal regression line.

Regression Plots

More Details & Source Code

For complete code, models, and data processing steps, check the full project on GitHub or read our final report:

🔙 Back to Portfolio