Highlights

  • Advanced parallelization of hierarchical uncertainty quantification (UQ)
  • Newly designed multi-index UQ method for parameter inference
  • Hardware-aware optimization of earthquake simulations, exploiting fused ensemble simulation
  • Large-scale Bayesian inference for earthquake parameters

(a) Map view of the GNSS and seismic stations that constrain the inversion. The stations are located within 100~km to the epicenter (the red star) of the Mw 7.1 Ridgecrest mainshock. We mark the fault trace (F1) ruptured by the $M_W$ 7.1 mainshock with a solid red curve. The black curves represent secondary fault traces. Pink rectangles indicate stations where we only use the static displacement data; orange rectangles indicate stations where high-rate displacement measurements are available; and the blue triangles are locations of strong-motion seismic stations. Inside the red rectangle shading, we highlight the region where we assume spatial variation of model parameters (γ0, γ1, α0, α1) to be inferred. (b) Co-seismic horizontal static displacements at the GNSS stations. The mean values of displacement output from models in MLDA are shown with blue arrows. The radii of the blue circles equal two times the standard deviation values in all models. The yellow arrows denote static displacement data.

Challenge

Bayesian inference delivers insights of great value to scientists, combining data and mathematical models to infer and understand otherwise inaccessible real-world properties. Computing the Bayesian posterior, i.e. the distribution of likely underlying parameters given observations, comes at great computational cost and corresponding environmental impact. To ensure accurate results, the uncertainty quantification algorithm requires many simulation runs. Each of these “forward” runs, e.g. in seismology, can easily span hundreds of compute nodes in an HPC cluster. We therefore need to be both extremely efficient on the statistical side, i.e. requiring few runs, and on the simulation side, i.e. making the most efficient use of modern clusters down to the hardware level. This challenge arises particularly in seismology: We have sensors at the earth’s surface as well as satellite data but need to infer the complex behavior of subsurface properties, even though a single simulation run for a single setting is already extremely compute intensive.


Research Topic

We aim to quantify the physical properties on and near seismic faults, and their impact on earthquake events, considering measurements of ground motion, obtained from seismographs or satellite data.  We use Bayesian inference, a powerful tool to infer likely statistical distributions of such properties, but which is often prohibitive due to its high computational cost. We reduce this computational effort by improving the uncertainty quantification (UQ) algorithms, exploiting hierarchies of approximate models, and by optimizing ensembles of earthquake simulations for state-of-the-art supercomputers.


Solution

We address these challenges from multiple angles. Firstly, we employ hierarchical methods that exploit approximate simulation models for efficiency. We introduce prefetching as a new approach to parallelizing hierarchical UQ methods, allowing us to scale applications by an additional factor. We also have advanced Multi-index Delayed Acceptance, an innovative extension of Multilevel Delayed Acceptance, making use of higher-dimensional model hierarchies. In addition, we have improved SeisSol, by exploiting optimization opportunities arising from fusing multiple simulation runs. Enabling the use of SeisSol in hierarchical UQ, we have designed a suitable hierarchy of approximate models for earthquake simulations, mixing numerical simulation at varying model resolution and data-driven surrogate models. Combining our advancements, we perform large-scale parameter inference of seismic events.


Panels of the Bayesian likelihood from the pretrained surrogate model and Bayesian posterior from MLDA inversion. The diagonal panels from top to bottom are 1D marginal posterior probabilistic distribution of off-fault plastic cohesion (γ0, γ1) and on-fault direct-effect parameter in SVW-RS friction (α0, α1). The super-diagonal panels are the 2D marginal Bayesian posterior of each parameter pair from MLDA inversion. The blue dots are all the effective samples in the eight Markov chains. The density of the dots is color-coded in blue, which is proportional to the density. The sub-diagonal panels are the 2D marginal Bayesian likelihoods as computed by the surrogate model, color-coded in gray. The red dots are the 52 samples used for pretraining the surrogate model. The less transparent dots correspond to samples with higher likelihood.