Highlights
- Successful reduction of time-to-solution by a factor of two for simulations of highly turbulent Rayleigh-Benard convection through parallelization-in-time
- Successful coupling of pySDC, Dedalus and pyTorch software
- Understanding the impact of noise issues arising when attempting to accelerate parallel-in-time algorithms via distributed Fourier Neural Operators
Keywords | Environment, Climate, Weather |
Technologies used | ML, FNO, GPU, CPU, SDC, PinT |


Challenge
Spatial parallelization by decomposing a computational domain eventually saturates as sub-domains become too small and communication costs starts to dominate. Parallel-in-time integration (PinT) has been shown to be able to extend strong scaling limits. However, the performance of PinT relies on the availability of cheap but sufficiently accurate coarse predictors that provide starting values for later points in time, thereby enabling rapid convergence. Numerical coarse propagators are difficult and labour-intensive to build. However, recent techniques from machine learning approaches for the solution of partial differential equations have shown promise to provide efficient coarse models for PinT.
However, so far, this approach has only been demonstrated for simple benchmark problems with limited parallel scaling. The challenge for NeuralPinT was to devise an accurate and efficient Fourier Neural Operator solver for the equations modelling Rayleigh-Benard convection and to use it to improve the speedup delivered by PinT. This first required the development of a suitable FNO architecture for the problems, the capability to evaluate it on sub-domains without substantial loss of accuracy, and finally its integration with the PinT algorithm.
Research Topic
Because modern high-performance computing systems are massively parallel, numerical algorithms for solving partial differential equations also need to become increasingly parallel to be able to translate compute power into application performance. For time-dependent problems, established serial time-stepping is becoming a performance bottleneck since strong scaling of spatial parallelization alone eventually produces diminishing returns due to communication overhead. Parallel-in-time integration (PinT)algorithms could unlock an additional direction of concurrency but have not yet reached the necessary level of maturity for routine use in HPC.
Solution
NeuralPinT aimed to show that Fourier Neural Operators (FNOs) are a promising approach to construct coarse propagators for PiNT algorithms. Their training is more automated than the design of numerical coarse solvers and, once trained, they execute faster, thus reducing the serial bottleneck caused by the coarse predictors. Finally, FNOs, like most neural networks, can run very efficiently on GPUs while numerical solvers sometimes struggle due to low computational intensity. Therefore, combinations of FNOs as coarse predictors and numerical solvers as fine methods could potentially help to better utilize modern, heterogeneous HPC systems.