Highlights

  • We developed the ability to compress plasma simulation data by up to 500x while retaining physical simulation quality
  • The compressed neural network representation of the plasma distribution data can be used to recover a physical simulation after e.g. node failures increasing resilience
  • The new compression method can effectively store hundreds of times more scientific data for exploitation
  • The presented concept of representing plasma distributions as neural presentations via these methods, or more advanced autoencoders, may be leveraged in the future to create fast AI-based solvers for high-dimensional partial differential equations

The Multilayer Perceptron architectures implemented in the project. Multiple regression—grouping neighboring VDFs to be compressed via a single MLP— is used to exploit GPU computing to the fullest.

Challenge

The ion-kinetic magnetosphere simulation, Vlasiator, solves the Vlasov equation for plasma ions, which consists of a six-dimensional partial differential equation. Solving the equation requires immense numbers of data points to be propagated in time, and the calculations are correspondingly heavy and time-consuming, even on supercomputers. The simulation state can be stored to disk to recover the simulation in case of node or network failures, to otherwise continue running the simulation, or to collect the data for scientific analysis. These save states, or restart files, can take up terabytes of disk space, and storage quota quickly becomes a limiting factor. Most of the disk footprint of the restart files comes from the 6D discretized grid of ion velocity distribution functions, which can be arbitrary shapes in space plasmas. Compressing these files could provide a solution, but lossless compression of floating-point data is not efficient enough to meaningfully shrink the disk footprint.


Research Topic

The goal of the project was first to investigate and  then to develop and implement lossy compression methods (using AI/ML technologies) for the very large data sets generated by six-dimensional plasma simulations. This would significantly increase a simulation’s capacity to store data meaning that failed simulations could more often be recovered with intact physical results. The resulting new method would also need to be fast and applicable to the largest supercomputers in the world.


Solution

The plasma distribution functions in Vlasiator are noise-free and exhibit correlations that can be anticipated to allow for compressed representations of the distributions. This project aimed to develop such presentations by using modern AI and ML methods. The solution proposed, and subsequently implemented, was to use a Multi-Layer Perceptron to compress the distribution of data using GPUs, with a CPU-supported, more orthodox Octree method as a fallback option for compression. These compression methods were developed as open-source libraries, and Vlasiator support for using theselibraries for writing and reading compressed restart files was implemented as the proof-of-concept.


First results of compressing a production-scale Vlasiator magnetospheric simulation, cross-sections of plasma density before (left) and after (right) compression and reconstruction. Minor differences are visible, especially in the smooth upstream, requiring some further adaptation. The plasma distribution data was compressed by a ratio of 356x, reducing the savestate file size from 2TB to 37GB, where approx. 20GB is uncompressed electromagnetic field data.