The University of Sheffield
Department of Computer Science

David Martin Undergraduate Dissertation 2000/01

"An Automatic Text Summariser"

Supervised by M.Hepple

Abstract

The purpose of this project was to implement an automatic text summariser. The system was to be capable of reading in an ASCII text format document and to produce a sentence extract summary that was representative of the input source text. The summary length was to be variable and the sentences chosen for it were to be the n most valuable sentences in the input document.

Several different algorithms for sentence selection were to be implemented allowing the thorough analysis and evaluation of the different summaries produced. Recall, precision, f-measures and content similarity measures were used to assess the output produced by the summariser and to justify the result.

Evaluation was drawn from the comparison of the automatically generated summaries and manually annotated ones. It was found that all of the scoring algorithms were valid ways for selecting sentences for an extract. All scored significantly better than the control method of randomly selected sentences out of the text.