COM2004 Data Driven Computing

School of Computer Science

COM2004 Data Driven Computing

Summary	This module is intended to serve as an introduction to machine learning and pattern processing, but with a clear emphasis on applications. The module is themed around the notion of data as a resource; how it is acquired, prepared for analysis and finally how we can learn from it. The module will employ a practical Python-based approach to try and help students develop an intuitive grasp of the sophisticated mathematical ideas that underpin this challenging but fascinating subject.
Session	Autumn 2024/25
Credits	20
Assessment	Assignments [LO3 and LO4] Formal examination [LO1, LO2 and LO3].
Lecturer(s)	Dr Po Yang, Dr Tong Liu & Dr Xingyi Song
Resources	Blackboard Unconfirmed practical marks when available Exam Papers, past 2 years (where applicable)
Aims	This unit aims to: provide an accessible introduction to key concepts in machine learning and pattern processing, demonstrate the application of machine learning in a number of recent research areas, develop an appreciation of the difficulties involved when trying to extract meaning from naturally occurring data with particular reference to data preprocessing, feature extraction, classifier design and efficient learning, To prepare students for specialised data-driven subjects at level 3/4 such as natural language processing, speech processing and computational biology.
Learning Outcomes	By the end of the module the student will be able to: Demonstrate how to extract features from data for use by machine learning (ML) techniques. Employ appropriate machine learning techniques to model and analyse complex datasets. Demonstrate the ability to apply ML in various areas of Computer Science, (e.g. in natural language processing, audio/speech processing, biological applications and vision processing), taking into account sustainability issues. Apply Python programming skills to perform data analysis, numerical modelling and visualisation for practical data analytics applications.
Content	This module will cover: Motivation and introduction to data driven computing including sustainability issues Multivariate data and probability distributions Classification, including Bayes� decision theory Non-parametric classifiers, including nearest-neighbour classifier Feature selection Feature generation Introduction to deep learning and neural networks Unsupervised learning and clustering
Restrictions	This module cannot be taken with COM3004.
Teaching Method	Lectures and laboratory classes.
Feedback	Feedback following the assignment and during labs/lectures for weekly formative exercise questions.