The University of Sheffield
Department of Computer Science

Ishita Agarwal MSc Dissertation 2014/15

Summarising Reader Comments in Online News

Supervised by R.Gaizauskas

Abstract

Online news websites are rapidly changing the traditional passive model of news consumption with a one where users actively engage in a dialogue over news articles by expressing their opinions in form of comments. The insights and perspectives provided by user comments form an integral part of online news reading experience. Most existing news websites only allow users to either view comments in a temporal order or as a threaded discussion. As news articles can easily accumulate hundreds of comments, such a presentation of user comments makes it inconvenient and time consuming for a new user to read and understand the previous comments. In order to improve the quality of online news reading experience, we built a system for generating an extractive summary of the user comments.

Our system extracts relevant information from the comments to form a summary by first clustering the comments and then using the most representative comment from each cluster to produce the summary. We used K-Means and LDA for comment clustering and most representative comment from each cluster was identified using the TF-IDF ranking algorithm. The summaries formed were evaluated using ROUGE metric. Our final system gives significantly better results than the baseline system, which forms a summary using the first comment of first 10 threads. We used data of "The Guardian" for all our experiments.