Our research focuses on significant improvements of performance and accuracy in application specific computing through a global optimization across the entire spectrum of numerical methods, algorithm design, software implementation and hardware acceleration.
These layers typically have contradictory requirements and their integration poses many challenges. For example, numerically superior methods expose little parallelism, bandwidth efficient algorithms convolve the processing of space and time into unmanageable software patterns, high level language abstractions create data layout and composition barriers, and high performance on today's hardware poses strict requirements on parallel execution and data access. High performance and accuracy for the entire application can only be achieved by balancing these requirements across all layers.
Particular attention is given to parallel algorithms and hardware (GPU, many-core, FPGA, custom) in relation to
- Data representation (mixed-precision, compression, redundancy)
- Data access (layout, spatial and temporal locality)
- Data structure (unstructured grids, graphs, adaptivity)
- Numerical methods (ILU, Krylov, GMG, AMG)
- Programming abstractions (CUDA, thrust, PSTL, C++2x, UPC++)