COM4521 Parallel Computing with Graphical Processing Units (GPUs)
||Accelerator architectures are discrete processing units which supplement a base processor with the objective of providing advanced performance at lower energy cost. Performance is gained by a design which favours a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at accelerated computing from multi-core CPUs to GPU accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised. Students should be aware that there are limited places available on this course.
Coursework and two multiple choice quizzes.
||Mr Robert Chisholm
- To introduce modern accelerator architectures, explain the difference between data and task parallelism and raise awareness into how the practical and theoretical performance of architectures differs.
- To give practical knowledge of how GPU programs operate and how they can be utilised for high performance applications.
- To develop an understanding of the importance of benchmarking and profiling in order to recognise factors limiting performance and to address these through optimisation.
By the end of this course students will be able to:
- Compare and contrast parallel computing architectures
- Implement programs for GPUs and multicore architectures
- Apply benchmarking and profiling to GPU programs to understand performance
- Identify and address limiting factors and apply optimisation to improve code performance
- Introduction to accelerated computing
- Introduction to programming in C
- Pointer and Memory
- Optimising C programs
- Multi core programming with OpenMP
- Introduction to Accelerated Computing
- Introduction to CUDA
- GPU memory systems
- Caching and Shared Memory
- Synchronisation and Atomics
- Parallel Primitives
- Asynchronous programming
- Profiling and Optimisation of GPU programs
||This module has a large amount of practical programming. Only students with a strong programming background should participate. The maximum number of students allowed on the module is 35.
Weekly lectures will introduce students to the background on CPU and GPU architectures and programming techniques. Lectures will highlight key design principles for parallel and GPU programming to give students the necessary insight to be able to constructively look at problems and understand the implications of parallel computing.
Lab sessions will facilitate hands on learning of practical skills through targeted exercises
||Students will receive continuous feedback from lab sessions and Google discussion groups. Feedback will also be given on marked quiz assignments and for the main assignment.