COM2004 Data Driven Computing
||This module is intended to serve as an introduction to
machine learning and pattern processing, but with a clear
emphasis on applications. The module is themed around the
notion of data as a resource; how it is acquired, prepared
for analysis and finally how we can learn from it. The
module will employ a practical Python-based approach to try
and help students develop an intuitive grasp of the
sophisticated mathematical ideas that underpin this
challenging but fascinating subject.
||Assignment (50%) and examination (50%).
||Prof. Jon Barker
||This unit aims to:
- provide an accessible introduction to key concepts in
machine learning and pattern processing,
- demonstrate the application of machine learning in a
number of recent research areas,
- develop an appreciation of the difficulties involved
when trying to extract meaning from naturally occurring
data with particular reference to data preprocessing,
feature extraction, classifier design and efficient
- To prepare students for specialised data-driven
subjects at level 3/4 such as natural language
processing, speech processing and computational biology.
|| By the end of the unit, a student will be able to
- demonstrate how to extract features from data for use
by machine learning (ML) techniques,
- demonstrate the ability to analyze and model data
using ML techniques,
- demonstrate the ability to apply ML in various areas
of Computer Science, e.g. in natural language
processing, audio/speech processing, biological
applications and vision processing,
- demonstrate the ability to use Python for scientific
- overview: classification and feature handling
- Python programming
- review: linear algebra/probability
- normal distribution
Instance based approaches
- Bayes decision theory
- risk and ROC (receiver operating characteristic)
- parameter estimation - maximum likelihood estimation
- curse of dimensionality and naive Bayes classifier
- nearest neighbour and k-nearest neighbour
- template matching and edit distance
- feature selection algorithms
Unsupervised learning and approaches to clustering.
- dimensionality reduction
- principal components analysis
Density estimation and mixture modelling.
Case study: Analysis of how techniques have been applied in
a real system.
||Lectures, problem classes and laboratory classes.
||Immediately from problem classes. After each assignment
stage through debriefing lecture and individual marking.
- Python Programming, https://en.wikibooks.org/wiki/Python_Programming