The University of Sheffield
Department of Computer Science

James Carmichael MSc Dissertation 2000/01

"The Implementation of a Speech Interface for The THISL Information Retrieval Engine"

Supervised by S.Renals

Abstract

This study discusses techniques used in the implementation of a speech user interface (SUI) for the THISL Spoken Document Retrieval (SDR) system. The SUI is designed to function as part of a multi-modal interface and not independently. The objective is to increase the efficiency of task performance by implementing speech commands which would perform the same operation(s) that would necessitate multiple key strokes and/or menu selections using a GUI interface. The user is informed, via various combinations of audio-visual prompts, about the system state and available options. The difficulties involved in achieving dynamic transition between rule and dictation grammars are discussed along with the obstacles encountered in achieving robust recognition. This application, developed using the IBM implementation of the Java Speech Application Programming Interface (JSAPI), has all the functionality – with the exception of displaying ASR generated text – of its predecessor as well as speech recognition and synthesis capabilities. The application's usage, however, is restricted to those systems which already have the IBM Via Voice speech recognition and synthesis engine installed.