COM6012 Scalable Machine Learning
Summary |
This module will focus on technologies and algorithms that can be applied to data at a very large scale (e.g. population level). From a theoretical perspective it will focus on parallelization of algorithms and algorithmic approaches such as stochastic gradient descent. There will also be a significant practical element to the module that will focus on approaches to deploying scalable ML in practice such as SPARK, programming languages such as Python/Scala and deployment on high performance computing platforms/clusters. |
Session |
Spring 2024/25 |
Credits |
15 |
Assessment |
- Formal examination
- Assignment
|
Lecturer(s) |
Dr Haiping Lu, Mr Tahsinur Khan & Dr Shuo Zhou |
Resources |
Unconfirmed practical marks when available |
Aims |
This unit aims to provide a deeper understanding of the fundamental technologies underlying data analytics at scale. In particular it will provide advanced understanding of
- parallelization of algorithms and algorithmic approaches such as stochastic gradient descent
- practical skills relating to the deployment of scalable ML
|
Learning Outcomes |
By the end of the unit, a student will be able to
- understand the theoretical issues and wider context relating to ML at scale
- understand practical parallelization of algorithms and algorithmic approaches using such techniques as stochastic gradient descent;
- deploy a practical implementation of ML at scale, using SPARK, and programming languages such as Python/Scala;
- deployment onto high performance computing platforms/clusters.
|
Content |
Introduction
- Spark overview
- Scala programming
Spark & HPC
- Spark DataFrame/dataset
- Machine learning pipeline
- High performance computing
Parallelization & optimization in Spark
- Parallelization
- Optimization
Scalable logistic regression & applications
Scalable GLM & applications
Scalable decision trees & applications
Scalable neural networks
Scalable matrix factorization for collaborative filtering & applications
Scalable KMeans clustering & applications
Scalable PCA for dimensionality reduction & applications
Other topics
|
Restrictions |
Optional modules within the department have limited capacity. We will always try to accommodate all students but cannot guarantee a place. |
Teaching Method |
Lectures, laboratory classes. |
Feedback |
Immediately for exercises in laboratory classes. After each coursework stage through debriefing lecture and individual marking. |
|