Eligible Student Poster 49th Lorne Conference on Protein Structure and Function 2024

EFG-CS: Predicting Chemical Shifts from Amino Acid Sequences with Protein Structure Prediction Using Machine Learning and Deep Learning Models (#324)

Xiaotong Gu 1 , Yoochan Myung 1 , Carlos Rodrigues 2 , David Ascher 1
  1. University of Queensland and Baker Institute, Melbourne, VIC, Australia
  2. Baker Institute, Melbourne, VIC, Australia

Nuclear magnetic resonance (NMR) crystallography is one of the main methods in structural biology for analysing protein stereochemistry and structure. The chemical shift of the resonance frequency reflects the effect of the protons in a molecule producing distinct NMR signals in different chemical environments. Chemical shifts are widely recognised as sensitive probes to delineate a diverse array of parameters in the field of structural biology. Apprehending chemical shifts from NMR signals can be challenging since having an NMR structure does not necessarily provide all the required chemical shift information, making predictive models essential for accurately deducing chemical shifts, either from protein structures or, more ideally, directly from amino acid sequences. Currently, available methods focus on predicting chemical shifts based on experimental protein structures, which are limited by the availability of empirical results. Here, we present EFG-CS, a web server that specialising in chemical shift prediction. EFG-CS employs a machine learning-based transfer prediction model for backbone atom chemical shift prediction, using ESMFold-predicted protein structures. Additionally, ESG-CS incorporates a Graph Neural Network-based model to provide more comprehensive side-chain atom chemical shift predictions. Our method has demonstrated reliable performance in backbone atom prediction, achieving comparable accuracy levels with root mean square errors (RMSE) of 0.30 ppm for H, 0.22 ppm for Ha, 0.89 ppm for C, 0.89 ppm for Ca, 0.84 ppm for Cb, and 1.69 ppm for N. Moreover, our approach also shows predictive capabilities in side-chain atom chemical shift prediction achieving RMSE values of 0.71 ppm for Hb, 0.74 to 1.15 ppm for Hd, and 0.58 to 0.94 ppm for Hg, solely utilising amino acid sequences without homology or feature curation. This web server is freely available and the chemical shift prediction results can be downloaded in tabular format and visualised in 3D format.