Both Graphics Processor Units (GPUs) and Reconfigurable Computing (RC) devices offer high performance on data parallel applications. For regular grids even early GPUs could be used to execute Finite Element PDE solvers . But without floating point numbers the use was restricted to applications with low accuracy requirements such as multimedia processing. The same was true for most RC devices which concentrated on integer processing. Very efficient image processing solvers could be implemented in this way, but the configuration of floating point operations was either not available or too expensive in terms of reconfigurable resources. With the inclusion of optimized floating point processing units into GPUs and RC devices, they now allow to perform more accurate computations. Although the precision is usually still restricted to the single float format, Mixed Precision Methods can perform most of the computations in single precision and still obtain a double precision result.
The main drawback of utilizing the parallel co-processors in scientific computing are the more complex programming models as opposed to micro-processors. RC devices are usually controlled by structural hardware description languages. These languages allow to exploit the full potential of the available parallelism but require a high design effort. GPUs have an easier, temporal programming model, but the potential for scientific computations is obscured by the application programming interface targeted towards graphics applications and computer games. By hiding most of the peculiarities of the graphics system under an abstraction layer, the GPU can be used as a general vector and array processor. This allows to solve various PDE problems in parallel on the GPU, without dealing with the details of graphics programming . For an orientation of the processing paradigm one can even characterize the GPU without any reference to graphics terminology . The abstraction has been successfully extended to GPU-Cluster Computing.
1. GPUs have a natural 2D memory layout which makes them very suitable for scientific computing on 2D and 3D domains. The darker the color the faster the subsequent access.
2. When using the traditional graphics pipeline for scientific computing in GPGPU fashion the most important aspect is the spatially coherent access to data in textures and the massively data parallel processing in the Fragment Processor. All other aspects can be hidden behind library abstractions .