Balanced Geometric Multigrid+
Summary
A problem specific configuration of a multigrid solver is often the most efficient tool for the solution of sparse linear equation systems. However, neither solvers with best numerical convergence nor solvers with best parallel efficiency are the best choice for an accurate and fast solution in practice. The fastest solvers require a delicate balance between their numerical and hardware characteristics. Balancing both aspects we can even parallelize strong sequential preconditioners with large parallel speedup and hardly any loss in numerical performance. In this way GMG can also solve ill-conditioned systems.
Besides extracting massive parallelism from initially sequential preconditioners there is the second big challenge of fine-grained data access and placement, which is crucial for high performance solvers. We present innovative on-the-fly data sorting techniques for Gauss-Seidel and tridiagonal preconditioners for maximum bandwidth utilization [2]. The entire approach is combined with mixed-precision execution and evaluated extensively on anisotropic grids. In a second step we have parallelized even stronger preconditiner and applied them to yet more ill-conditioned problems, including strong operator anisotropies [1].
Although geometric multigrid is not as flexible as algebraic multigrid, we conclude that with advanced smoothers GMG can also solve many ill-conditioned problems efficiently.
Here we discusses the parallel solution of ill-conditioned problems of about 1 million unknowns on a single GPU. This is an critical builiding block in our library for solving much larger distributed memory problems with GPU-cluster computing.
Figures
On the left data flow and placement in shared memory in the standard cyclic reduction. Hats stand for updated equations. On the right data placement in our bank-conflict free implementation of cyclic reduction with on-the-fly data rearrangement. The data flow (arrow connections) is exactly the same as on the left, but we have omitted most arrows to improve clarity. In both figures circles in the same column occupy the same memory address, i.e. in-place updates are performed.
Total runtime efficiency of the GMG solvers with strong smoothers, on the CPU (left) and the GPU (right), for an elliptic problem with operator and domain anisotropies with sizes (2^L+1) x (2^L+1), logarithmic scale.
Current people
- Robert Strzodka (co-PI)
Publications
- Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed Precision MultigridIEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue: High Performance Computing with Accelerators, 22(1), 22-32, IEEE Computer Society, 2011
@article{39, author = {Göddeke, Dominik and Strzodka, Robert}, title = {Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed Precision Multigrid}, year = {2011}, journal = {IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue: High Performance Computing with Accelerators}, volume = {22}, number = {1}, pages = {22-32}, month = jan, publisher = {IEEE Computer Society}, url = {http://asc.ziti.uni-heidelberg.de/sites/default/files/research/papers/public/GoSt11CR.pdf} doi = {10.1109/TPDS.2010.61} }
- Mixed Precision GPU-Multigrid Solvers with Strong Smoothers, 131-147, CRC Press, 2010
@inbook{38, author = {Göddeke, Dominik and Strzodka, Robert}, title = {Mixed Precision GPU-Multigrid Solvers with Strong Smoothers}, year = {2010}, pages = {131-147}, publisher = {CRC Press}, isbn = {9781439825365}, url = {http://asc.ziti.uni-heidelberg.de/sites/default/files/research/papers/public/GoSt10MG.pdf} }
- Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulationsInternational Journal of Parallel, Emergent and Distributed Systems (IJPEDS), Special issue: Applied parallel computing, 22(4), 221-256, Taylor & Francis, 2007
@article{30, author = {Göddeke, Dominik and Strzodka, Robert and Turek, Stefan}, title = {Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations}, year = {2007}, journal = {International Journal of Parallel, Emergent and Distributed Systems (IJPEDS), Special issue: Applied parallel computing}, volume = {22}, number = {4}, pages = {221-256}, publisher = {Taylor \& Francis}, url = {http://asc.ziti.uni-heidelberg.de/sites/default/files/research/papers/public/GoStTu07mixedPrec.pdf} }
- Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components, 259-268, 2006
@conference{27, author = {Strzodka, Robert and Göddeke, Dominik}, title = {Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components}, year = {2006}, pages = {259-268}, url = {http://asc.ziti.uni-heidelberg.de/sites/default/files/research/papers/public/StGo06PipeCG.pdf} }
- Mixed Precision Methods for Convergent Iterative Schemes, D-59-60, 2006
@conference{25, author = {Strzodka, Robert and Göddeke, Dominik}, title = {Mixed Precision Methods for Convergent Iterative Schemes}, year = {2006}, pages = {D-59-60}, month = may, url = {http://asc.ziti.uni-heidelberg.de/sites/default/files/research/papers/public/StGo06convIter.pdf} }
- Accelerating Double Precision FEM Simulations with GPUs, 2005
@conference{18, author = {Göddeke, Dominik and Strzodka, Robert and Turek, Stefan}, title = {Accelerating Double Precision FEM Simulations with GPUs}, year = {2005}, month = sep, url = {http://asc.ziti.uni-heidelberg.de/sites/default/files/research/papers/public/GoStTu05double.pdf} }