Sharpening the BLADE: missing data imputation using supervised machine learning

Date published:

1 July 2020

Topics

Publisher

Office of the Chief Economist

Incomplete data are quite common, which can affect evidence-based policymaking. An example is the Business Longitudinal Analysis Data Environment (BLADE), an Australian government national data asset.

In this paper, we aimed to help BLADE practitioners achieve greater data coverage.

To do this we developed PyImpuyte. This Python package expands data coverage by using artificial intelligence and machine learning to intelligently impute missing values.

Using PyImpuyte, we carried out a series of repeated controlled machine learning experiments to predict missing values. We then evaluated each algorithm’s performance.

We found:

the ensemble family of algorithms performed best
the extra trees regressor was most accurate and efficient.

We are now applying these algorithms to enhance coverage in other data sets.

Authors: Marcus Suresh, Ronnie Taib, Yanchang Zhao and Warren Jin.

Download

Sharpening the BLADE: missing data imputation using supervised machine learning [pdf 1.22 MB]

Sharpening the BLADE: missing data imputation using supervised machine learning [docx 2.61 MB]

Sharpening the BLADE: missing data imputation using supervised machine learning

Topics

Publisher

Download

Contact us at the department

Connect with us at the department

Acknowledgement of Country

Sharpening the BLADE: missing data imputation using supervised machine learning

Download or share

Topics

Publisher

Download

Acknowledgement of Country