The University of Sheffield
Department of Computer Science

Haochen Sun MSc Dissertation 2014/15

Automatic Plagiarism Detection against Large Text Collections

Supervised by M.Hepple

Abstract

Plagiarism is the dishonest practice of taking the words or ideas of another and presenting them as if they were your own and is seemed as a dishonest and unaccptable behaviour in many field.

There are many types of plagiarism and also many methods tofight against plagiarism. This report will introduce a traditional IR-based approach and a novel SimHash-based approach, to detect plagiarism. Then some improvements will be made to make the SimHash-based approach perform better.