COM4521 Parallel Computing with Graphical Processing Units (GPUs)

School of Computer Science

COM4521 Parallel Computing with Graphical Processing Units (GPUs)

Summary	Accelerator architectures are discrete processing units which supplement a base processor with the objective of providing advanced performance at lower energy cost. Performance is gained by a design which favours a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at accelerated computing from multi-core central processing units (CPUs) to graphics processing unit (GPU) accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised.
Session	Spring 2024/25
Credits	15
Assessment	Coursework and two multiple choice quizzes.
Lecturer(s)	Dr Robert Chisholm
Resources	Blackboard Unconfirmed practical marks when available Exam Papers, past 2 years (where applicable)
Aims	To introduce modern accelerator architectures, explain the difference between data and task parallelism and raise awareness into how the practical and theoretical performance of architectures differs. To give practical knowledge of how GPU programs operate and how they can be utilised for high performance applications. To develop an understanding of the importance of benchmarking and profiling in order to recognise factors limiting performance and to address these through optimisation.
Learning Outcomes	By the end of this course students will be able to: Compare and contrast parallel computing architectures Implement programs for GPUs and multicore architectures Apply benchmarking and profiling to GPU programs to understand performance Identify and address limiting factors and apply optimisation to improve code performance
Content	Introduction to accelerated computing Introduction to programming in C Pointer and Memory Optimising C programs Multi core programming with OpenMP Introduction to Accelerated Computing Introduction to CUDA GPU memory systems Caching and Shared Memory Synchronisation and Atomics Parallel Primitives Asynchronous programming Profiling and Optimisation of GPU programs
Restrictions	This module has a large amount of practical programming. Only students with a strong programming background should participate. Optional modules within the department have limited capacity. We will always try to accommodate all students but cannot guarantee a place.
Teaching Method	Weekly lectures will introduce students to the background on CPU and GPU architectures and programming techniques. Lectures will highlight key design principles for parallel and GPU programming to give students the necessary insight to be able to constructively look at problems and understand the implications of parallel computing. Lab sessions will facilitate hands on learning of practical skills through targeted exercises
Feedback	Students will receive continuous feedback from lab sessions and Google discussion groups. Feedback will also be given on marked quiz assignments and for the main assignment.