Relatively inexpensive scientific PC clusters have become very popular in recent years and already dominate the TOP 500 list of the fastest computers. Each node of such a cluster can be enhanced with a powerful graphics card forming a GPU-cluster. In this way the peak performance of the system increases enormously without putting much strain on the space or cooling constraints. However, programming of parallel computers is already a demanding task. The inclusion of GPUs components into the nodes of a parallel computer not only requires a different programming model for these devices but also creates a heterogeneous hardware system. This project explores the efficient utilization of such heterogeneous systems for scientific computing.
The focus is both on high performance and high productivity. We enhance the FEM solver package FEAST with GPU functionality in a minimally invasive fashion, with less than 1% of the code basis being affected . Moreover, applications based on FEAST can benefit from the GPU acceleration without any code changes. We explore the large-scale scalability  and the practical benefits and limits  of this approach in detail.
1. Bandwidth in a typical GPU-node. Algorithms executing on the GPU-cluster must be able to tolerate the enormous discrepancy between the bandwidth on the co-processor board and the bandwidth from board to board that has to pass through the main memory of the hosts.
2. Displacements and van Mises stress of an object under load, computed with FeastSolid on a heterogeneous 16 node cluster using GPUs as scientific co-processors; no code changes, equal accuracy, 2.6x speedup.