GPU Algorithm Design

Lecture and Exercises (LSF, Moodle)

  • Prof. Dr. Robert Strzodka
  • Thursday, 9:30 - 11:00, 11:00 - 12:45
  • 4 SWS and 6 ECTS
  • INF 350 / OMZ R U012
  • Start 2023-10-26


  • Most recent developments in GPUs
  • Advanced algorithms for ultimate hardware utilization
  • On-the-fly data transformations
  • Data locality optimizations
  • Hierarchical algorithms
  • SIMD utilization
  • Precision, accuracy and numerical schemes
  • Numerical efficiency vs. parallel efficiency
  • Data representation


All leading system in the TOP500 list of fastest computers utilize some many-core device as the basis, in most cases a dedicated GPU. However, only the shown Linpack benchmark runs so fast, while most scientific applications perform 100x-1000x slower. This course teaches GPU algorithm design principles which enable full performance for many application domains.

The lecture is partly based on the book Programming Massively Parallel Processors by David B. Kirk and Wen-mei W. Hwu, available online at the Heidelberg University Library (direct link 3rd edition, direct link 4th edition), however, we will go much beyond that.

We assume the audience are already familiar with CUDA programming and thus with the contents of the first 6 chapters of that book. This knowledge can be obtained in the lecture GPU Computing or from the book itself. We will use the following chapters on Parallel Patterns for the lectures. After the patterns we will cover more advanced topics with respect to parallelism and numerical computations, often discussing parallelization strategies for seemingly unparallelizable, sequential algorithms. This is about finding surprising algorithms and clever techniques to reach ultimate performance.

The lecture presents the most up-to-date developments in parallel computing on GPUs and so is the ideal basis for a thesis and research utilizing GPUs. While GPUs will be the device of choice for implementations and exercises, most of the algorithmic reasoning applies also to other many-core processors.

In the first weeks there will be regular exercises and mid-term each group will choose a larger scientific project. The results will be presented by the students in the last lecture. For the exercises and the projects we will provide access to high performance GPUs with the newest functionality.


There is no advance registration, simply attend the first meeting.