Highlights

  • GPU radial basis expansion and kernel linearizations were mathematically formulated and implemented with 10x speedup compared to a CPU-only implimentation
  • GPU based Electrostatics realized a 5x speedup over the CPU code
  • On-the-fly reparameterization of accurate many-body dispersion interactions was implemented reducing computational complexity (computational scaling) from O(N3) to O(N), where N is the number of atoms
  • GPU implementation of X-ray diffraction prediction/forces allowed for structural agreement with experiment and produced speedups of over 100x compared to CPU implementations
  • GPU local property prediction generated speedups of ~300x compared to CPU implementations, allowing for complex physics and phenomena prediction


Challenge

The central challenge of the XCALE project was adapting various parts of atomistic machine learning simulations, which had been developed in the context of pre-existing CPU-based codes, to make them compatible with efficient execution on GPUs. This is necessary to adapt to the changing paradigm of high-performance computing, based on accelerators, and be able to make use of (pre)exascale supercomputers. For this reason, not only did the codes have to be ported to GPUs but, in some cases, the algorithms had to also be adapted to GPU-friendly logic. This mostly involved re-casting some of the algorithms in a way that enables linear-algebraic operations (i.e., matrix-matrix, vector-vector and matrix-vector operations). In addition, all the usual challenges of writing GPU kernels, handling memory and communication, also had to be addressed. One example was the challenge of porting X-ray diffraction, which consisted of making Fourier transforms for experimental derivatives and the calculation of a gaussian-convolved pair distribution function, with derivatives, for each of the 1 billion atoms. This caused a huge problem for scaling in the CPU code, which was resolved with GPU porting.


Research Topic

Machine-learned interatomic potentials (or simply, machine learning potentials) are a fast and accurate way to explore the potential structures of complex materials. They enable the prediction of atomic properties, energies, and forces by learning from quantum mechanical calculations, without an explicit model of physics. An improvement is realised by adding physics-inspired interactions between atoms, however, such additions come at a computational cost. The goal of XCALE was to develop new physics-inspired methods and improve their scalability via GPU acceleration to be able to perform accurate computational materials science across scales.


Solution

The solution was to redesign the algorithms prior to GPU porting in a way that vectorization could be exploited, whenever the existing algorithms and their CPU implementation were poorly suited for GPU execution. This was the case for the SOAP basis expansion and construction, and generalized Hamiltonian dynamics. Whenever the existing algorithm was already GPU friendly, the team proceeded to port the existing CPU implementation. In both cases, the GPU code was explicitly optimized by writing dedicated kernels and taking appropriate care of addressing memory and communication bottlenecks.