The University of Sheffield
Department of Computer Science

COM6521 Parallel Computing with Graphical Processing Units (GPUs)

Summary

Accelerator architectures are discrete processing units which supplement a base processor with the objective of providing advanced performance at lower energy cost. Performance is gained by a design which favours a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at accelerated computing from multi-core central processing units (CPUs) to graphics processing unit (GPU) accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised. You should be aware that there are limited places available on this course.

Session Spring 2023/24
Credits 15
Assessment

Coursework and two multiple choice quizzes

Lecturer(s) Mr Robert Chisholm
Resources
Aims
  • To introduce modern accelerator architectures, explain the difference between data and task parallelism and raise awareness into how the practical and theoretical performance of architectures differs.
  • To give practical knowledge of how GPU programs operate and how they can be utilised for high performance applications.
  • To develop an understanding of the importance of benchmarking and profiling in order to recognise factors limiting performance and to address these through optimisation.
Learning Outcomes 

By the end of this course students will be able to:

  • Compare and contrast parallel computing architectures
  • Implement programs for GPUs and multicore architectures
  • Apply benchmarking and profiling to GPU programs to understand performance
  • Identify and address limiting factors and apply optimisation to improve code performance
Content
  • Introduction to accelerated computing
  • Introduction to programming in C
  • Pointer and Memory
  • Optimising C programs
  • Multi core programming with OpenMP
  • Introduction to Accelerated Computing
  • Introduction to CUDA
  • GPU memory systems
  • Caching and Shared Memory
  • Synchronisation and Atomics
  • Parallel Primitives
  • Asynchronous programming
  • Profiling and Optimisation of GPU programs
Restrictions This module has a large amount of practical programming. Only students with a strong programming background should participate. The maximum number of students allowed on the module is 65.
Teaching Method

Weekly lectures will introduce students to the background on CPU and GPU architectures and programming techniques. Lectures will highlight key design principles for parallel and GPU programming to give students the necessary insight to be able to constructively look at problems and understand the implications of parallel computing.

Lab sessions will facilitate hands on learning of practical skills through targeted exercises

Feedback Students will receive continuous feedback from lab sessions and Google discussion groups. Feedback will also be given on marked quiz assignments and for the main assignment.