The University of Sheffield
Department of Computer Science

COM4521 Parallel Computing with Graphical Processing Units (GPUs)

Summary Computing architectures are rapidly changing towards scalable parallel computing devices with many cores. Performance is gained by new designs which favour a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at parallel computing from multi-core CPUs to GPU accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised. Students should be aware that there are limited places available on this course.
Session Spring 2021/22
Credits 15
Assessment

Coursework and two multiple choice quizzes.

Lecturer(s) Dr Paul Richmond
Resources
Aims
  • To introduce modern accelerator architectures, explain the difference between data and task parallelism and raise awareness into how the practical and theoretical performance of architectures differs.
  • To give practical knowledge of how GPU programs operate and how they can be utilised for high performance applications.
  • To develop an understanding of the importance of benchmarking and profiling in order to recognise factors limiting performance and to address these through optimisation.
Objectives

By the end of this course students will be able to:

  • Compare and contrast parallel computing architectures
  • Implement programs for GPUs and multicore architectures
  • Apply benchmarking and profiling to GPU programs to understand performance
  • Identify and address limiting factors and apply optimisation to improve code performance
Content
  • Introduction to accelerated computing
  • Introduction to programming in C
  • Pointer and Memory
  • Optimising C programs
  • Multi core programming with OpenMP
  • Introduction to Accelerated Computing
  • Introduction to CUDA
  • GPU memory systems
  • Caching and Shared Memory
  • Synchronisation and Atomics
  • Parallel Primitives
  • Asynchronous programming
  • Profiling and Optimisation of GPU programs
Restrictions This module has a large amount of practical programming. Only students with a strong programming background should participate. The maximum number of students allowed on the module is 30.
Teaching Method

Weekly lectures will introduce students to the background on CPU and GPU architectures and programming techniques. Lectures will highlight key design principles for parallel and GPU programming to give students the necessary insight to be able to constructively look at problems and understand the implications of parallel computing.

Lab sessions will facilitate hands on learning of practical skills through targeted exercises

Feedback Students will receive continuous feedback from lab sessions and Google discussion groups. Feedback will also be given on marked quiz assignments and for the main assignment.
Recommended Reading
  • Edward Kandrot, Jason Sanders, "CUDA by Example: An Introduction to General-Purpose GPU Programming", Addison Wesley 2010.
  • Brian Kernighan, Dennis Ritchie, “The C Programming Language (2nd Edition)”, Prentice Hall 1988.