GPU Algorithm Design

Lecture and Exercises (LSF, Moodle)

  • Prof. Dr. Robert Strzodka
  • Thursday, 9:30 - 11:00, 11:00 - 12:45
  • 4 SWS and 6 ECTS
  • INF 350 / OMZ R U012
  • Start 20223-??-??


  • Most recent developments in GPUs
  • On-the-fly data transformations
  • Data locality optimizations
  • Hierarchical algorithms
  • SIMD utilization
  • Precision, accuracy and numerical schemes
  • Numerical efficiency vs. parallel efficiency
  • Data representation


The lecture is partly based on the book Programming Massively Parallel Processors by David B. Kirk and Wen-mei W. Hwu. Both the 2nd edition from 2013 and the 3rd edition from 2017 can be accessed online at the Heidelberg University Library (direct link to book).

We assume the audience are already familiar with CUDA programming and thus with the contents of the first 6 chapters of that book. This knowledge can be obtained in the lecture GPU Computing or from the book itself. We will use the following chapters on Parallel Patterns for the lectures. After the patterns we will also cover more advanced topics with respect to parallelism and numerical computations, often discussing parallelization strategies for seemingly sequential algorithms.

The lecture presents the most up-to-date developments in parallel computing on GPUs and so is the ideal basis for a thesis and research utilizing GPUs. While GPUs will be the device of choice for implementations and exercises, most of the algorithmic reasoning applies also to other many-core processors.

In the first weeks there will be regular exercises and mid-term each group will choose a larger scientific project. The results will be presented by the students in the last lecture. For the exercises and the projects we will provide access to high performance GPUs with the newest functionality.


The lectures Parallel Algorithm Design and Advanced Parallel Algorithms can be attended in the same semester in parallel. Parallel Algorithm Design has fewer prerequisits and looks at more topics in breadth, while Advanced Parallel Algorithms requires familiarity with CUDA programming and looks at fewer topics in depth.


There is no advance registration, simply attend the first meeting.