The University of Sheffield
Department of Computer Science

Weiwei Xu MSc Dissertation 2015/16

The TWITTERATI: Text Analysis of Twitter and other Social Media

Supervised by M.Hepple

Abstract

The rapid development of social media can offer a huge amount of data, in order to classify these automatically; many researchers have been attracted by sentiment analysis technologies. In this paper, in order to evaluate the performance of different classifiers in different conditions, three sentiment analysis approaches, lexicon-based classification, Naive Bayes classification and SVM classification will be introduced, meanwhile, we will create three classifiers and use processed movie reviews, which have been introduced by Pang and Lee in 2004, to test them, moreover, the comparison and evaluation will be made between them. In our results, the lexicon-based classification has the lowest accuracy reach 65.2% then is Naive Bayes classification with trigram reach 82.5% (use 1800 reviews to train, classify 200 reviews), the best one in SVM classification with binary weighting scheme reach 90% (C=1, use 1800 reviews to train, classify 200 reviews). Moreover, according to our results, Naive Bayes classifier only can maintain the high efficiency while it works with limited size of corpus, and for SVM, the more data that used to train SVM classifier, the better performance the SVM will have.