GPU Cluster Computing

Relatively inexpensive scientific PC clusters have become very popular in recent years and already dominate the TOP 500 list of the fastest computers. Each node of such a cluster can be enhanced with a powerful graphics card forming a GPU-cluster. In this way the peak performance of the system increases enormously without putting much strain on the space or cooling constraints. However, programming of parallel computers is already a demanding task. The inclusion of GPUs components into the nodes of a parallel computer not only requires a different programming model for these devices but also creates a heterogeneous hardware system. This project explores the efficient utilization of such heterogeneous systems for scientific computing.

The focus is both on high performance and high productivity. We enhance the FEM solver package FEAST with GPU functionality in a minimally invasive fashion, with less than 1% of the code basis being affected [1]. Moreover, applications based on FEAST can benefit from the GPU acceleration without any code changes. We explore the large-scale scalability [2] and the practical benefits and limits [3] of this approach in detail.


  1. Bandwidth in a typical GPU-node. Algorithms executing on the GPU-cluster must be able to tolerate the enormous discrepancy between the bandwidth on the co-processor board and the bandwidth from board to board that has to pass through the main memory of the hosts.



2. Displacements and van Mises stress of an object under load, computed with FeastSolid on a heterogeneous 16 node cluster using GPUs as scientific co-processors; no code changes, equal accuracy, 2.6x speedup.




Dominik Göddeke, Robert Strzodka, Jamaludin Mohd-Yusof, Patrick McCormick, Sven H.M. Buijssen, Matthias Grajewski, and Stefan Turek. Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Computing, Special issue: High-performance computing using accelerators, 33(10–11):685–699, November 2007. 
Dominik Göddeke, Robert Strzodka, Jamaludin Mohd-Yusof, Patrick McCormick, Hilmar Wobker, Christian Becker, and Stefan Turek. Using GPUs to improve multigrid solver performance on a cluster. International Journal of Computational Science and Engineering (IJCSE), 4(1):36–55, 2008. 
Dominik Göddeke, Hilmar Wobker, Robert Strzodka, Jamaludin Mohd-Yusof, Patrick McCormick, and Stefan Turek. Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU. International Journal of Computational Science and Engineering (IJCSE), 4(4):254–269, November 2009.