The University of Sheffield
Department of Computer Science

Tingyu Luo MSc Dissertation 2014/15

Data Mining of Income Prediction: Research using Classification Algorithms

Supervised by E.Vasilaki

Abstract

Income Prediction modeling is an essential tool to improve national budget planning, commercial descision making, or to predict trends in other fields, such as the estimation of carbon dioxide emssions. The main discussion of current models built based on the Census Income Data is their accuracies. In this research, new models, trained by Naive-Bayes, C4,5 and Logistic Regression classifiers and combined through majority voting, were developed to predict the salary using the same Census Income data. The best model in this research achieves an accuracy of 85.1%, a result higher than that produced by the models of Cerquides and Lopez de Mantara(2004), Kohavi(1997), and Bennet(2000). It was found that the preparation of the data before the mining process and the effecient feature subset were the main reason of the increment.