Highlights
- First large-scale application of spectral prediction models in DFT for protein dimers with a significant reduction in SCF iterations
- Eigenvalue predictions through three widely different ML and DL methods achieve accuracy levels equivalent to half-converged SCF calculations
- High-quality DoS predictions demonstrate the reliability of the approach, using exascale-ready methodology with scalability for large biomolecular simulations
Keywords | Computational Chemistry, Drug Discovery, High-Performance Computing, Biophysics, Quantum Mechanics, Deep Learning |
Technologies used | GNN, Machine Learning (Random Forest, Kernel Ridge Regression), GPU, Python, bigDFT, SLEPc, FRASE |

Challenge
Computational materials science and biomolecular simulations face a major bottleneck in solving large-scale sparse eigenproblems efficiently. Traditional DFT-based quantum simulations are computationally demanding, especially for biological systems, often requiring thousands of different DFT calculations to achieve statistical significance.
Simulating protein-protein and protein-ligand interactions at quantum accuracy is crucial for drug design and biophysics but remains largely impractical due to the computational cost of solving the Kohn-Sham equations for large biomolecular assemblies. The challenge was to develop a method that could predict spectral properties, accelerating SCF convergence while maintaining accuracy.
Research Topic
The research focused on accelerating electronic structure calculations for large biomolecular systems by integrating machine learning-based spectral prediction into Density Functional Theory (DFT) simulations. Specifically, it aimed to enhance the solution of large-scale sparse eigenproblems, enabling efficient simulations of protein-protein interactions. By leveraging a database of 660 protein dimer calculations, we developed a predictor system that estimates eigenvalues and Density of States (DoS). This advancement paves the way for scalable quantum simulations applicable to drug discovery, materials science, and exascale computing.
Solution
To overcome this challenge, we developed a spectral prediction model trained on 660 protein dimer calculations, using BigDFT’s Linear Scaling approach. This model predicts eigenvalues before SCF iterations begin, providing high-quality initial guesses for the electronic structure. By reducing the required number of self-consistency steps from tens to a few iterations, we achieved substantial computational savings while maintaining spectral accuracy. As shown in our results, the predicted DoS closely follows the computed reference, proving that the method can be systematically applied to large biomolecular datasets.