The University of Sheffield
Department of Computer Science

COM3110 Text Processing

Summary This module introduces fundamental concepts and ideas in natural language text processing, covers techniques for handling text corpora, and examines representative systems that require the automated processing of large volumes of text. The course focuses on modern quantitative techniques for text analysis and explores important models for representing and acquiring information from texts. Students should be aware that there are limited places available on this course
Session Autumn 2021/22
Credits 10
  • Assignments [LO1 and LO3]
  • Formal examination [LO2 and LO3]
Lecturer(s) Prof. Rob Gaizauskas, Dr Carolina Scarton & Dr Temitope Adeosun
Aims The aims of this module are:
  • to develop an understanding of the fundamentals of how text is represented and processed in a computer;
  • to acquire familiarity with standard computational techniques for handling text corpora;
  • to develop an understanding of the basic problems and principles underlying text processing applications.

By the end of this unit, a candidate should be able to:

  1. code in a programming language well-suited to text handling
  2. identify and explain key techniques that are relevant to performing a number of text processing tasks
  3. implement systems able to analyse large volumes of textual data, and to perform basic text processing tasks
  • Programming for text processing
  • Text processing topics, such as:
    • Text Encoding and Text Compression
    • Vector-based Representations for Words and Documents
    • Information Retrieval
    • Information Extraction
    • Sentiment analysis
    • Summarisation
Teaching Method
  • There will be 2 lectures per week, with not more than 20 lectures overall.
  • A third session will be available for lab classes or tutorials some weeks.
Feedback Students can discuss their lab exercise code during lab sessions. They will receive feedback comments on their marked assignment work later on in the term (prior to the exam).
Recommended Reading