The University of Sheffield
Department of Computer Science

COM3004 Data Driven Computing

Summary This module is intended to serve as an introduction to machine learning and pattern processing, but with a clear emphasis on applications. The module is themed around the notion of data as a resource; how it is acquired, prepared for analysis and finally how we can learn from it. The module will employ a practical Python-based approach to try and help students develop an intuitive grasp of the sophisticated mathematical ideas that underpin this challenging but fascinating subject.
Session Autumn 2023/24
Credits 20
Assessment

Assignments [LO3 and LO4]
Formal examination [LO1, LO2 and LO3].

Lecturer(s) Dr Matt Ellis, Dr Po Yang & Dr Xingyi Song
Resources
Aims This unit aims to:
  • provide an accessible introduction to key concepts in machine learning and pattern processing,
  • demonstrate the application of machine learning in a number of recent research areas,
  • develop an appreciation of the difficulties involved when trying to extract meaning from naturally occurring data with particular reference to data preprocessing, feature extraction, classifier design and efficient learning,
  • To prepare students for specialised data-driven subjects at level 3/4 such as natural language processing, speech processing and computational biology.
Learning Outcomcs 

By the end of the unit, a student will be able to

  1. demonstrate how to extract features from data for use by machine learning (ML) techniques,
  2. demonstrate the ability to analyze and model data using ML techniques,
  3. demonstrate the ability to apply ML in various areas of Computer Science, e.g. in natural language processing, audio/speech processing, biological applications and vision processing,
  4. demonstrate the ability to use Python for scientific computing.
Content Introduction
  • overview: classification and feature handling
  • Python programming
Multivariate data
  • review: linear algebra/probability
  • normal distribution
Classification
  • Bayes decision theory
  • risk and ROC (receiver operating characteristic)
  • parameter estimation - maximum likelihood estimation
  • curse of dimensionality and naive Bayes classifier
Linear classifiers
  • perceptron
  • XOR problem
Instance based approaches
  • nearest neighbour and k-nearest neighbour
  • template matching and edit distance
Feature selection
  • discriminability
  • feature selection algorithms
Feature generation
  • dimensionality reduction
  • principal components analysis

Introduction to Deep Learning, including

  • training neural networks
  • regularisation
  • convolutional neural networks
  • recurrent neural networks

Unsupervised learning and approaches to clustering.

  • sequential clustering
  • hierarchical clustering
  • hard and soft k-means clustering

Density estimation and mixture modelling.

Restriction This module cannot be taken with COM2004.
Teaching Method Lectures, problem classes and laboratory classes.
Feedback Immediately from problem classes. After each assignment stage through debriefing lecture and individual marking.