COM6012 Scalable Machine Learning

School of Computer Science

COM6012 Scalable Machine Learning

Summary	This module will focus on technologies and algorithms that can be applied to data at a very large scale (e.g. population level). From a theoretical perspective it will focus on parallelization of algorithms and algorithmic approaches such as stochastic gradient descent. There will also be a significant practical element to the module that will focus on approaches to deploying scalable ML in practice such as SPARK, programming languages such as Python/Scala and deployment on high performance computing platforms/clusters.
Session	Spring 2024/25
Credits	15
Assessment	Formal examination Assignment
Lecturer(s)	Dr Haiping Lu, Mr Tahsinur Khan & Dr Shuo Zhou
Resources	Unconfirmed practical marks when available
Aims	This unit aims to provide a deeper understanding of the fundamental technologies underlying data analytics at scale. In particular it will provide advanced understanding of parallelization of algorithms and algorithmic approaches such as stochastic gradient descent practical skills relating to the deployment of scalable ML
Learning Outcomes	By the end of the unit, a student will be able to understand the theoretical issues and wider context relating to ML at scale understand practical parallelization of algorithms and algorithmic approaches using such techniques as stochastic gradient descent; deploy a practical implementation of ML at scale, using SPARK, and programming languages such as Python/Scala; deployment onto high performance computing platforms/clusters.
Content	Introduction Spark overview Scala programming Spark & HPC Spark DataFrame/dataset Machine learning pipeline High performance computing Parallelization & optimization in Spark Parallelization Optimization Scalable logistic regression & applications Scalable GLM & applications Scalable decision trees & applications Scalable neural networks Scalable matrix factorization for collaborative filtering & applications Scalable KMeans clustering & applications Scalable PCA for dimensionality reduction & applications Other topics
Restrictions	Optional modules within the department have limited capacity. We will always try to accommodate all students but cannot guarantee a place.
Teaching Method	Lectures, laboratory classes.
Feedback	Immediately for exercises in laboratory classes. After each coursework stage through debriefing lecture and individual marking.