The University of Sheffield
Department of Computer Science

Rachel Sharp Undergraduate Dissertation 2015/16

Using Metadata in Comment Summarization

Supervised by R.Gaizauskas

Abstract

Most work on comment summarization currently consists of extracting sentences from the group of comments to be used in a summary. This is done by clustering the comments by their topic and then ranking the comments within cluster by informativeness. Presenting comments to readers in this fashion can reduce the time spent to find informative comments.

Comment metadata  is data that describes comment data. The thread structure of comments can improve the way in which comments are clustered and ranked. In this project I produce a baseline comment summarization system and extend it using comment metadata . I find that using thread structure as part of the clustering approach gives strong results and implement a version of LDA which uses thread structure when calculating document-topic probabilities.