The University of Sheffield
Department of Computer Science

COM4511 Speech Technology

Summary This module introduces the principles of the emergent field of speech technology, studies typical applications of these principles and assesses the state of the art in this area. Students will learn the prevailing techniques of automatic speech recognition (based on statistical modelling); will see how speech synthesis and text-to-speech methods are deployed in spoken language systems; and will discuss the current limitations of such devices. The module will include project work involving the implementation and assessment of a speech technology device. Students should be aware that there are limited places available on this course.
Session Spring 2023/24
Credits 15
Assessment
  • Blackboard quizzes (threshold and graded)
  • Practical work (graded)
Lecturer(s) Prof. Thomas Hain & Dr Anton Ragni
Resources
Aims
  • to teach the principles and application of speech technology, covering speech recognition and synthesis
  • to provide experience in building and using speech technology devices.
Learning Outcomes  By the end of this course the students should:
  • appreciate the difficulties of machine perception in general and speech perception in particular;
  • understand the different types of speech tech in use today
  • understand the prevailing techniques for modelling speech in automatic speech recognition;
  • see how these techniques are deployed in spoken language systems;
  • appreciate the difficulties of producing synthetic speech and understand the principles of speech synthesisers and text-to-speech systems; and
  • have experience in implementing and assessing a speech technology device.
Content
  • Introduction to speech technology
  • Pattern processing fundamentals for speech
  • Hidden Markov Models and Deep Neural Networks for speech processing
  • Towards a state-of-the-art ASR System
  • Acoustic modelling
  • Language modelling
  • Search
  • Adaptation
  • Speech recognition application examples
  • Speaker identification
  • Speech synthesis
  • Spoken dialogue systems
Restrictions This module is only open to students who have taken COM3502 or COM4502.
Teaching Method Two formal lectures per week, for 10 weeks; and a one hour practical session per week, for 6 weeks. 
Practical work will consist of a project involving the implementation and assessment of a speech technology device. This will include some programming.
Feedback Students will receive feedback in the weekly practical sessions.