Highlights

  • Developed highly parallel, efficient, asynchronous multi-GPU Lagrangian particle tracking algorithm
  • Implemented in Julia with API to OpenFOAM, adaptable for other CFD solvers
  • Scaling runs on Mare Nostrum 5 reveal good weak and strong scaling
  • Collection of test cases for validation
  • Test simulation of 81 m³ cloud chamber with up to 256 billion cloud droplets

Snapshot of droplets in the 81m³ cloud chamber. Two side walls and the top are removed for illustration purposes and only a selection of droplets is shown. The size of the droplets is scaled up proportional to their diameter, but with an additional scaling factor to make them visible.

Challenge

The application of Euler-Lagrange simulations is constrained by the number of particles that can be processed with reasonable effort, both in terms of computational time and the available hardware resources. This limitation arises from the difficulty in achieving efficient parallel scaling for the Lagrangian part of the simulation, which poses a significant challenge due to, for example, non-uniform particle distributions, communication overhead, and idling times. CPUs are restricted by the number of parallel threads, while GPUs face limitations predominantly due to the amount of available memory. Even though powerful HPC systems are available, no simulations involving more than 1 billion particles had been conducted, as revealed by a literature survey. For a wide range of applications, such as the aforementioned large cloud chamber or real-world clouds, the ability to efficiently handle a substantially higher number of particles offers researchers a much more comprehensive insight into the underlying working mechanisms and complex interactions governing these systems.


Research Topic

As an application for Euler-Lagrange simulations with billions of particles we chose cloud chamber simulations. Cloud chambers are laboratory facilities able to create and maintain an artificial cloud. They are used for investigating clouds in a controlled and repeatable manner, as this is cumbersome and nearly impossible in the free atmosphere. In our simulation, particles represent individual cloud droplets that grow and shrink, depending on the conditions they experience. These simulations are used for designing cloud chambers, as well as for designing and interpreting experiments therein. Furthermore, it is possible to extract parameterisations for large scale cloud simulations and weather and climate models.


Solution

SCALE-TRACK’s novel idea is to process particles in chunks, using a structure-of-arrays approach, executing the calculation of Eulerian and Lagrangian phases asynchronously on CPUs (Eulerian) and GPUs (Lagrangian), and using parallel decomposition in the Lagrangian phase independent from that of the Eulerian. This method significantly reduces waiting times and synchronisation barriers. An efficient implementation of particle properties drastically reduces memory requirements on the GPUs. The Lagrangian phase algorithm is implemented in the Julia programming language, making extensive use of its built-in capabilities for CUDA and parallelisation using co-routines and threads, while an OpenFOAM-based solver is employed for computing the Eulerian phase.


Schematic execution timeline for one GPU driven by a host CPU and a slave CPU. Asynchronous overlap of Eulerian and Lagrangian calculations is clearly visible.