The University of Sheffield
Department of Computer Science

COM4115 Text Processing

Summary This module introduces fundamental concepts and ideas in natural language text processing, covers techniques for handling text corpora, and examines representative systems that require the automated processing of large volumes of text. The course focuses on modern quantitative techniques for text analysis and explores important models for representing and acquiring information from texts.
Session

Autumn 2024/25

Credits 15
Assessment
  • Assignment [LO1 and LO3]
  • Formal examination [LO2 and LO3]
Lecturer(s) Dr Zhixue Zhao & Prof. Rob Gaizauskas
Resources
Aims

The aims of this module are: 

  • to develop an understanding of the fundamentals of how text is represented and processed in a computer;
  • to acquire familiarity with standard computational techniques for handling text corpora;
  • to develop an understanding of the basic problems and principles underlying text processing applications;
  • for one or more topics in text processing to explore refinements beyond the most basic approaches.
Learning Outcomes  By the end of this unit, a candidate should be able to:
  1. code in a programming language well-suited to text handling
  2. identify and explain key techniques that are relevant to performing a number of text processing tasks
  3. implement systems able to analyse large volumes of textual data, and to perform basic and, in selected cases, more advanced text processing tasks
Content
  • Programming for text processing
  • Text processing topics, such as:
    • Text Encoding and Text Compression
    • Vector-based Representations for Words and Documents
    • Information Retrieval
    • Information Extraction
    • Sentiment analysis
    • Summarisation
Restrictions

Not permitted for students who have already taken COM3110

Optional modules within the department have limited capacity. We will always try to accommodate all students but cannot guarantee a place. 

Teaching Method
  • There will be 2 lectures per week, with not more than 20 lectures overall.
  • A third session will be available for lab classes or tutorials some weeks.
Feedback Students can discuss their lab exercise code during lab sessions. They will receive feedback comments on their marked assignment work later on in the term (prior to the exam).