The University of Sheffield
Department of Computer Science

COM3004 Data Driven Computing

Summary This module is intended to serve as an introduction to machine learning and pattern processing, but with a clear emphasis on applications. The module is themed around the notion of data as a resource; how it is acquired, prepared for analysis and finally how we can learn from it. The module will employ a practical Python-based approach to try and help students develop an intuitive grasp of the sophisticated mathematical ideas that underpin this challenging but fascinating subject.
Session Autumn 2024/25
Credits 20
Assessment

Assignments [LO3 and LO4]
Formal examination [LO1, LO2 and LO3].

Lecturer(s) Dr Ning Ma, Dr Po Yang & Dr Xingyi Song
Resources
Aims This unit aims to:
  • provide an accessible introduction to key concepts in machine learning and pattern processing,
  • demonstrate the application of machine learning in a number of recent research areas,
  • develop an appreciation of the difficulties involved when trying to extract meaning from naturally occurring data with particular reference to data preprocessing, feature extraction, classifier design and efficient learning,
  • To prepare students for specialised data-driven subjects at level 3/4 such as natural language processing, speech processing and computational biology.
Learning Outcomcs 

By the end of the module the student will be able to:

  1. Demonstrate how to extract features from data for use by machine learning (ML) techniques.
  2. Employ appropriate machine learning techniques to model and analyse complex datasets.
  3. Demonstrate the ability to apply ML in various areas of Computer Science, (e.g. in natural language processing, audio/speech processing, biological applications and vision processing), taking into account sustainability issues.
  4. Apply Python programming skills to perform data analysis, numerical modelling and visualisation for practical data analytics applications.
  5. Critically analyse the benefits of different ML techniques in a given scenario. 
Content This module will cover:
  • Motivation and introduction to data driven computing including sustainability issues 
  • Multivariate data and probability distributions 
  • Classification, including Bayes’ decision theory
  • Non-parametric classifiers, including nearest-neighbour classifier
  • Feature selection
  • Feature generation
  • Introduction to deep learning and neural networks
  • Unsupervised learning and clustering 

Restrictions This module cannot be taken with COM2004.
Teaching Method Lectures and laboratory classes.
Feedback Feedback following the assignment and during labs/lectures for weekly formative exercise questions