The University of Sheffield
Department of Computer Science

COM4511 Speech Technology

Summary This module introduces the principles of the emergent field of speech technology, studies typical applications of these principles and assesses the state of the art in this area. Students will learn the prevailing techniques of automatic speech recognition (based on statistical modelling); will see how speech synthesis and text-to-speech methods are deployed in spoken language systems; and will discuss the current limitations of such devices. The module will include project work involving the implementation and assessment of a speech technology device. Students should be aware that there are limited places available on this course.
Session Spring 2021/22
Credits 15
  • Blackboard quizzes (threshold)
  • Practical work (graded)
Lecturer(s) Prof. Thomas Hain & Dr Anton Ragni
  • to teach the principles and application of speech technology, covering speech recognition and synthesis
  • to provide experience in building and using speech technology devices.
Objectives By the end of this course the students should:
  • appreciate the difficulties of machine perception in general and speech perception in particular;
  • understand the different types of speech tech in use today
  • understand the prevailing techniques for modelling speech in automatic speech recognition;
  • have seen how these techniques are deployed in spoken language systems;
  • appreciate the difficulties of producing synthetic speech and understand the principles of speech synthesisers and text-to-speech systems; and
  • have experience in implementing and assessing a speech technology device.
  • Introduction to speech technology
  • Pattern processing fundamentals for speech
  • Hidden Markov Models and Deep Neural Networks for speech processing
  • Towards a state-of-the-art ASR System
  • Acoustic modelling
  • Language modelling
  • Search
  • Adaptation
  • Speech recognition application examples
  • Speaker identification
  • Speech synthesis
  • Spoken dialogue systems
Restrictions This module is only open to students who have taken COM3502 or COM4502.
Teaching Method Two formal lectures and a one hour practical session per week, for 10 weeks.
Practical work will consist of a project involving the implementation and assessment of a speech technology device. This will include some programming.
Feedback Students will receive feedback in the weekly practical sessions.
Recommended Reading
  • Juraffsky and Martin (2009), Speech and Language processing
  • Jelinek (1997), Statistical Methods for Speech Recognition
  • Huang, Acero and Hon (2001), Spoken language processing: a guide to theory, algorithm, and system development
  • Clark, Fox and Lappin(Eds) (2010), The Handbook of Computational Linguistics and Natural Language Processing
  • Woelfel and McDonough(2010), Distant speech recognition
  • Paul Taylor (2009), Text-to-speech synthesis