The University of Sheffield
Department of Computer Science

Wan Chiu Undergraduate Dissertation 2014/15

Big Data Analyzing: A Clustering Algorithm

Supervised by J.Derrick

Abstract

Nowadays, the volume of data is increasing uncontrollably. It can be produce from everywhere, such as our daily lives, experiments, videos and business records. As technology improves, the challenges to data storage reduce. However, using this 'big data' becomes the challenge. In order to make it to useful, there are various methods to analyze big data instead of merely hoarding it, such as statistical analysis, data mining, predictive analytics and text mining. Analyzing data can useful trends and these can be used to enhance service, save time, energy and money.

This project reviews the definition of big data, methods for big data analysis, the algorithm of clusterization and the software for clusterization. A clustering algorithm will also be implemented using Matlab. The data from a firefighter call log will be used to implement the algorithm.