★Massively parallel graph matching, coloring and coarsening★Algebraic multigrid is one of the main tools in science and industry for the solution of large sparse linear equation systems. However, an efficient all-level parallelization of the algebraic multigrid method on a GPU-cluster requires many innovations in parallel numerical schemes, graph algorithms and work scheduling.
Neither solvers with best numerical convergence nor solvers with best parallel efficiency are the best choice for the fast solution of PDE problems in practice. The fastest solvers require a delicate balance between their numerical and hardware characteristics. Balancing both aspects we can parallelize strong sequential preconditioners with large parallel speedup and hardly any loss in numerical performance. In this way GMG can also solve ill-conditioned systems.
★Acceleration of unmodified legacy code on GPU-clusters★ A single GPU already offers two levels of parallelism, but similar to CPUs, demand for higher performance and larger problem sizes leads to the utilization of GPU-clusters, in which every cluster node is equipped with GPUs. This adds the intra-node and inter-node parallelism. The main challenge for these heterogeneous systems is the enormous discrepancy in the bandwidth between the two finer and two coarser levels of parallelism and their integration in legacy code.