The University of Sheffield
Department of Computer Science

Matthew Hersee Undergraduate Dissertation 2000/01

"Automatic Detection of Plagiarism"

Supervised by M.Hepple

Abstract

Plagiarism is a problem affecting institutions across the globe and higher education institutions are no different. The Internet is already having a massive impact on plagiarism, making it easier for students to copy work from a plethora of sources. The rate of Internet growth means that this is not a problem that will go away.

Most plagiarism detection tools use a simple process of pattern matching to detect plagiarism. The purpose of this dissertation is to explore the possible use of Stylometry to achieve an automated plagiarism detection tool.

The area of Stylometry is relatively new. It proposes that when an author writes a document that author has an underlying 'style' in which he or she 'has' to write. This 'style' is predetermined by some unconscious habitual features and is hence consistent throughout a given document. The changes in the 'style' of an author can be illustrated using the Cusum technique. The project seeks to explore the hypothesis,

"Can deviations in the 'style' of a document be used to determine mixed authorship in that document, and hence signify that plagiarism has occurred?"

A variety of interpretations of Cusum data are made in an attempt to find a satisfactory plagiarism detection system.