The University of Sheffield
School of Computer Science

Giorgio Rumore Undergraduate Dissertation 2017/18

Visualising document collections using Word Clouds

Supervised by M.Stevenson

Abstract

Every document collection contains several topics; Topic models aim to identify all these topics, analysing and collocating the most used words inside the documents into several topics. This project aims to develop a topic browser with an intuitive and attractive interface, using topic models. The topics will be identified using the latent Dirichlet allocation (LDA), through the use of the toolkit Gensim. The existing topic browsers do not have an intuitive interface. Therefore, word clouds and word storm will be implemented in the system, which will differentiate this topic browser from the other ones already available on the market.