The University of Sheffield
Department of Computer Science

Yiu Choi Undergraduate Dissertation 2014/15

Building training data from Wikipedia for Name Entity Recogniser

Supervised by R.Gaizauskas

Abstract

Name Entity Recognition is a process to locate name entites in an article such as person,location and organisation. In order to perform this task, training data is needed to teach the system identifying the name entities. There are two methods to construct the training data. The typical approach is generating the training data manually. This method is formidable to archieve due to numerous resources applied to build the training data.

However, an idea has been raised to generate the training data automatically using free online encyclopedia - Wikipedia. Recent researches in this particular field indicate the approach is applicable and even better this approach could reduce the workload and the maintenance pf the experts. Hence, this dissertation is aimed to develop a training data using Wikipedia and evaluate the result of the training data showing the improvement and how effective this approach could be.