The University of Sheffield
Department of Computer Science

Jennifer Parak MSc Dissertation 2014/15

Cracking CAPTCHAs

Supervised by R.Clayton

Abstract

Completely Automated Turing test to tell Computers and Humans Apart (CAPTCHAs) are widely used on the Internet to keep it safe from bots and spammers. CAPTCHAs present a challenge-response test, based on a hard Artificial Intelligence problem, which is supposed to be difficult to solve for computers, but easy to solve for humans. This research project was focused on cracking CAPTCHAs. From the literature review we concluded that research on text-based CAPTCHAs was exhausted and therefore drew our attention to breaking image-based CAPTCHAs and explored research in the field of image recognition. By attempting to crack this type of CAPTCHA, we were simultaneoulsy attempting to create a programme, which is able to solve a hard computer vision problem of image labelling. This project should be seen as a proof of concept, exploring whether the new Google image-based CAPTCHAs are vulnerable to such an attack. In-depth research was carried out, in order to (1) examine approaches that have been used to classify images, (2) point out problems and difficulties of cracking an image recognition CAPTCHA, (3) identify features of a 'strong' image recognition CAPTCHA and (4) use methods identified to break a simple image recognition CAPTCHA. We developed and pre-trained a Convolutional Neural Network and tested it on unseen test data from the STL-10 dataset. With an accuracy of 70%, we successfully implemented a model, which is capable of deciphering image-based CAPTCHAs and integrated it into a simple Graphical User Interface for passing CAPTCHA challenges. Based on our research we derived certrain features that could ensure the security of image recognition CAPTCHAs.