Molecular dynamics is the process of simulating the behaviour of a molecule in a solvent using high-performance computers. It is a computationally expensive task requiring hours or even days to simulate fractions of a second of the molecule behaviour. In this work, we aim to develop protein structure fingerprints, a compressed representation of protein structures containing enough information to be used in predictive models, and employ them to accelerate molecular dynamics using machine learning.
Recent advances in artificial intelligence made it possible to discover the most sophisticated relations within data. Transformer-based models have shown remarkable results in processing languages and areas resembling languages, such as protein sequences. Generative adversarial networks manage to create artificial images and audio indistinguishable from real. Our goal is to use these approaches to construct long molecular dynamics trajectories within a short time, using Transformer-based models to encode proteins and GANs to generate the trajectories.
We employ a dataset from the Protein Data Bank, selecting 8031 high-resolution protein structures. Conventional representations suffer from rotation and shift issues, which we address by converting proteins to internal coordinates. This structured matrix resolves these issues but introduces sensitivity to errors in backbone angles.
This work will enhance the efficiency of protein dynamics simulations, offering a valuable tool for studying protein behaviour and interactions in complex environments, with potential applications in drug discovery and molecular biology.