Highlights
- Developed a New linear algebra backend for NekRS: The Ginkgo high performance and hardware-portable math library
- Developed a low-communication mixed-precision distributed Algebraic Multigrid (AMG) method for Ginkgo
- asynchronous Schwarz to overlap the local and global solver
- Achieved Performance improvements by using Ginkgo in nekRS
Keywords | Aeronautics, Energy, Environment/Climate/Weather, Manufacturing & Engineering, Mechanical Engineering, CFD, nekRS, Ginkgo library, GPU, mixed precision Algebraic Multigrid (AMG) |

Challenge
A computationally demanding part of CFD simulations is the solution of the discretized governing partial differential equations (PDEs). The typically iterative linear solvers used for this step often account for a large fraction of the total simulation time. This is partly because the characteristics of the iterative solvers do not match the hardware architecture well. Modern processors are over-provisioned for low-precision computation but perform poorly when running memory-intensive applications. Furthermore, the upcoming Exascale-level supercomputers in Europe will be composed of thousands of GPUs, with each GPU itself being composed of thousands of light-weight cores. Leveraging these systems efficiently in production runs for applications in research and industry requires all components of the simulation software stack to map well to the hardware and to scale up to thousands of latest generation GPU processors. State-of-the-art iterative solvers, on the other hand, typically consist of memory-bound, sparse, high-precision operations that achieve low performance on these architectures. The bulk-synchronous nature of the solvers limits their scalability on supercomputers composed of thousands of GPUs and millions of lightweight GPU cores.
Research Topic
The goal of the inno4scale project experiment AceAMG was to increase the performance of the nekRS Computational Fluid Dynamics (CFD) software on the upcoming Exascale supercomputer Jupyter. As a first step, we integrated the high-performance library Ginkgo with the nekRS software stack as an optional backend. This gives nekRS access to a plethora of solvers and preconditioners available in the Ginkgo library. As a second step, we designed low-synchronization, low-communication, mixed precision numerical methods that are suitable for many-GPU hardware setups. In particular we focused on the development of a low-synchronization (asynchronous) and mixed precision Algebraic Multigrid (AMG) solver. The intention was to increase performance and reduce the memory footprint. Finally, we wanted to implement the low-communication mixed precision AMG solver as a high-performance solution for nekRS simulations in the Ginkgo library.
Solution
We have reengineered linear solvers to 1) map well to the GPU hardware architecture by leveraging low precision in mixed precision algorithms like mixed precision Algebraic Multigrid (AMG) methods; 2) reduce the total communication volume by communicating between GPUs; 3) avoid global synchronizations across all GPUs by employing the asynchronous communication features available in MPI.
Following these reengineering workflows, we developed a distributed low-communication mixed precision Algebraic Multigrid method inside the Ginkgo math library that is performance-portable across GPUs from AMD, Intel, and NVIDIA, and smoothly integrates into the nekRS Computational Fluid Dynamics package. For numerically challenging problems, like the pebble bed application relevant for energy simulations, the Ginkgo solver can accelerate the linear solve up to 8x for large production runs.