CBM4scale – inno4scale

This Innovation Study presents a novel algorithmic approach aimed at improving the scalability of Graph Neural Network (GNN) frameworks on European supercomputers. By integrating a newly designed Compressed Binary Matrix (CBM) storage format and an optimised Compressed Binary Matrix Multiplication (CBMM) kernel into existing GNN architectures, this study promises significant performance improvements for critical High Performance Computing (HPC) applications. The CBM format provides superior compression rates for binary matrices while streamlining matrix operations, enabling accelerated computation in a wide range of domains, including natural language processing, social network analysis, biology, and physics.

Graph Neural Networks (GNNs) have become indispensable tools for processing graph-structured data, with a focus on artificial intelligence applications. However, the need for exascale computation poses challenges to conventional sparse storage formats used in GNN frameworks. To address this, this study develops a novel CBM storage format that accompanies the CBMM kernel to exploit the details of binary matrices. By exploiting the efficiency of CBM representations, CBM4scale aims to improve GNN-based AI applications on European supercomputers.

The focus of this study is to develop a compressed binary matrix storage format that exploits the sparsity and structural regularities present in graph data. Unlike common sparse formats such as COO (Coordinate List) and CSR (Compressed Sparse Row), which focus solely on non-zero entries, the CBM format encodes row-to-row differences (deltas), reducing storage requirements while minimising computational overhead. An optimised CBMM kernel enables efficient matrix multiplication operations by improving performance on various HPC architectures.

The advances in CBM4scale are expected to be seamlessly applicable to large-scale heterogeneous HPC architectures, providing commensurate performance gains. Preliminary evaluations predict at least a twofold speedup in both time and energy-to-solution metrics during the training phase and a threefold speedup during the inference phase on shared memory parallel systems compared to traditional sparse storage formats. By significantly enhancing the scalability and efficiency of GNN frameworks, CBM4Scale has the potential to enable unprecedented application scale and enable novel, previously unattainable use-case scenarios. Therefore, this innovation study shows how Graph Neural Networks deployed in AI-driven applications on European supercomputers can optimise their scalability.