The University of Sheffield
Department of Computer Science

Richard Allen MSc Dissertation 2000/01

"Automatic Detection of Plagiarism"

Supervised by R.Gaizauskas

Abstract

Plagiarism is unfair, especially in a student environment where those that put little effort in to a piece of work can come out with top grades. Something needs to be done and the only practical solution is Automatic Plagiarism Detection. Paraphrasing is an act that can be used to plagiarise, and can often be found in a simple form. The program developed in this dissertation attempts to detect paraphrasing as a stepping-stone to the main goal, plagiarism detection. The method used is a new approach which encompasses recent work and old concepts, but using them in a different way. The results of which are thoroughly tested and evaluated to maximise its accuracy and to discover weaknesses in its approach. Performance is shown to be high and has a potential to do well if developed further. The technique is taken some way using just a simple measure on which to make the decision. Machine learning is suggested to improve the performance of REPEAT, but concludes that more training data is required. A new idea produces promising results. REPEAT detects features such as word insertion and deletion as well as synonym changes using a simple but effective parsing algorithm.