AlphaFold : Accurate protein structure prediction

Introduction

In this blog, we would discuss AlphaFold: Accurate protein structure prediction. Google’s DeepMind created the artificial intelligence (AI) tool AlphaFold, which makes predictions about protein structure. The software is intended to be a deep learning system. AlphaFold’s 3D models of proteins are much more precise than any that have come before, which represents a huge advancement on one of the crucial challenges in biology.

History of Alphafold

The CASP community forum enables scientists to discuss their progress on the protein-folding problem. The community also sponsors a biannual challenge for research teams to evaluate the precision of their hypotheses in comparison to actual experimental results. Teams are given a variety of amino acid sequences for proteins whose precise 3D form has been mapped but has not yet been made available to the general public. To determine how well their predictions match the subsequently disclosed structures, groups must submit their best predictions.

In the protein structure prediction challenge, AlphaFold won the competition among the teams that took part in CASP13 (2018). The most recent version of AlphaFold, which has now attained a level of accuracy believed to solve the protein structure prediction problem, was presented at CASP14 (2020).

Protein Folding problem

Protein folding is the process by which amino acid chains spontaneously fold to produce the three-dimensional (3-D) structures of proteins. The protein’s biological activity depends on its 3-D shape. However, it can be very difficult to comprehend how the 3-D structure is determined by the amino acid sequence; this is known as the “protein folding problem”. Understanding the thermodynamics of the interatomic forces that determine the folded stable structure, the mechanism and pathway by which a protein can quickly achieve its final folded state, and how the native structure of a protein can be predicted from its amino acid sequence are all aspects of the “protein folding problem.”

Numerous computational strategies have been used by researchers over the years to address the problem of protein structure prediction, but, with the exception of small basic proteins, their accuracy has not come close to matching experimental methods, limiting their usefulness. CASP, which was established in 1994 to challenge the scientific community to create the best protein structure predictions, discovered that by 2016, the most challenging proteins could only expect to receive GDT scores of roughly 40 out of 100. In the 2018 CASP, AlphaFold started to apply a deep learning approach from artificial intelligence (AI).

Algorithm for AlphaFold: Accurate protein structure prediction

DeepMind used a public database of protein sequences and structures to train the program on more than 170,000 proteins. The implementation uses a type of attention network, a deep learning method that focuses on letting the AI algorithm recognize elements of a bigger problem and then put them together to acquire the entire solution. The total amount of computing power used for the training was between 100 and 200 GPUs. It took “a few weeks” to train the system on this hardware, and it would then take “a matter of days” for the algorithm to converge for each structure.

AlphaFold 1, 2018

AlphaFold is a distance map predictor that uses 220 residual blocks in an extremely deep residual neural network to handle input data derived from two 64 amino acid segments in a representation of dimensionality 64×64×128. Each residual block consists of three layers, one of which is a 33 dilated convolutional layer. The blocks cycle between dilation of the values 1, 2, 4, and 8. The model includes 21 million parameters in total. The network makes use of both 1D and 2D inputs, as well as co-evolutionary features and evolutionary profiles from other sources.

AlphaFold 2, 2020

In order to create the guiding potential, which was subsequently merged with the physics-based energy potential, a number of modules in the software design utilized in AlphaFold 1 were used. Each module was trained individually. In place of this, AlphaFold 2 introduced a system of interconnected sub-networks that formed a single, fully-based, differentiable end-to-end model that was trained as an integrated whole.

Protein Structure Database

The AlphaFold Protein Structure Database is a collaboration between AlphaFold and EMBL-EBI. At the time of its launch, the database had around 365,000 proteins, or virtually the whole UniProt proteome of humans and 20 model species, with models of their protein structures predicted by AlphaFold. Proteins having fewer than 16 or more than 2700 amino acid residues are not included in the database, however, for humans, they are available in the entire batch file. As of the beginning of 2022, AlphaFold planned to contribute more sequences to the collection, with the initial objective being to cover the majority of the UniRef90 set, which contains more than 100 million proteins. 992,316 predictions were available as of May 15, 2022.

The AlphaFold Protein Structure Database is available at AlphaFold Protein Structure Database (ebi.ac.uk)