The University of Sheffield
School of Computer Science

Aftab Chaudhary Undergraduate Dissertation 2017/18

Web-based tools for browsing multichannel conversational speech database

Supervised by J.Barker

Abstract

Speech corpus is an extensive data of audio recordings in spoken languages, and most of the speech data consist of text transcription of the spoken words and time occurred during each word's recording. Research can be done by either record new data or using existing data that is already available in any format. Majority of this speech corpus is already available in English language or other languages as well. A large speech corpus stored at one place could be advantageous, to abstract specific sections smoothly and efficiently, but it requires an appropriate method.

A real aim of this project is to design and develop a web-browsing application to extract information by regular expressions and word phrases from a different level of text transcriptions of speech corpus, and results will be displayed in audio and text as well.

Node.js is a Javascript library that will be used to develop a simple web application. The application will be tested to measure the accuracy of search results and performance as well.