Over the last 25 years, pioneering computational methods have been developed to estimate the biophysical effects of missense mutations, using experimentally derived structures and biophysical measurements. While many protein structures still remain to be experimentally elucidated, recent advances such as Alphafold2 have dramatically increased our ability to accurately model novel proteins. However, there has been no systematic evaluation of the reliability of these tools in the absence of experimental structural data. We have, therefore, systematically investigated the performance and robustness of widely used structural methods to predict the effect of mutations on protein stability and protein-ligand binding affinity when presented with these non-experimental models.
When assessing the robustness of protein stability predictors, homology models were built using templates at a range of sequence identity levels (from 15% to 95%), while AlphaFold2 models were generated mainly based on the parameters set in the CASP14. We found that there is indeed a performance deterioration of the machine learning-based predictors on homology models built using templates with sequence identity below 40%, where sequence-based tools might become preferable. In contrast, predictive performance of the structural approaches using AlphaFold models shows high consistency with the experimental results.
Given the insights observed for protein stability predictors, we further investigated the robustness of the approaches predicting the effect of mutation on protein-ligand binding affinity. Similarly, homology modelling and AlphaFold2 were utilised to generate non-experimental receptors, while AutoDock Vina was used to generate the protein-ligand complexes. We expected to observe performance deteriorated when the structural deviation happened at the ligand binding site.
In brief, our works will not only improve the interpretation of the results from these in-silico biophysical measurements, but guide the development of next-generation methods for protein engineering and drug development.