The University of Sheffield
Department of Computer Science

Murtadha Sahbudin MSc Dissertation 2015/16

Automatic Plagiarism Detection against Large Text Collection

Supervised by M.Hepple

Abstract

Plagiarism is a major issue in numbers area of profession, especially in academia and research mainly. External plagiarism detection method is the approach which a set of suspected document compared against an external source. In this project, we aim to make use of the IR-Based method for candidates source selection and pre -processing methods. Next, we performed in-depth text analysis detection within the suspicious against  candidates source document. In achieving this we investigate with pre- processing methods of Rapid Keywords Extraction and trigrams  collocation, implemented IR-Bases using Lucene  for retrieval  of source document candidates, and finally implemented a method for text alignment detection within passage using Jaccard   Coefficient  score. The evaluation and corpus were based on the PAN PC 2011 standards.