The University of Sheffield
Department of Computer Science

Lanting Cheng MSc Dissertation 2014/15

The TWITTERATI: Text Analysis of Twitter and other Social Media

Supervised by M.Hepple

Abstract

The rapid growth in the use of social media generates a massive amount of electronic textual data. These data facilitates the tasks of Text Processing and Sentiment Analysis. In this report, we explore two methods of Sentiment Analysis and make comparisons and discussion between them. We present polarity classifiers on lexicon-based method and machine learning methods by applying a polarity movie review dataset. Then built a subjectivity classifier by applying a subjectivity dataset for filtering out the objective sentences, giving a new version of movie review dataset with subjective sentences. We evaluated the polarity classifiers again with the new dataset to discover whether the performances could be improved. In our results, the proposed subjectivity classifier we built can accurately represent the sentiment information of the original dataset. We can achieve a salient improvement in the Naive Bayes classifier from 80% to 85 by applying the new dataset. The performance of the lexicon-based classifier and the Support Vector machine classifier remained at the same level compared with the previous performances applied the original movie review dataset, which are around 63% and 85% respectively