Tag Archives: Protein Structure

DeepMind uses AI to understand life.

Life at the molecular level that is.

Last week saw the breakthrough news that Google has essentially solved the protein folding problem with AlphaFold from DeepMind. I was especially interested in this since this was the area of my PhD.

Function follows structure

Proteins carry out a variety of functions from DNA replication to catalysis to structuring the cytoskeleton.  Each protein is built up from a unique sequence formed from 20 different amino acids. Some 200M sequences are currently known, growing by about 30M per year. The chain of amino acids folds into a unique 3D structure.  This structure determines its functionality.

Prediction: the shape of things to come

Some 170,000 protein structures have been determined to date, and DeepMind has used this dataset to create an algorithm which can predict the 3D structure of a protein based only on its sequence of amino acids, to the same level of accuracy as if actually measured using a technique such as X-ray crystallography.  A reasonably sized protein might take as many as 10300 different shapes, so that’s quite a prediction!

This is relevant because understanding the 3D structure of a protein can inform its function and arguably mis-function, thereby potentially accelerating the rational design of interventions such as drugs against disease states for example.  With 200M proteins in scope, the potential for scientific discovery is massive.

Now we can look to Google not only in search of pizza, but also for the elixir of life.

Determined structures

25 years ago I calculated the 3D structure of a protein essentially by hand (serine proteinase human stefin A, see below) – with a simulated annealing protocol using distance and angle constraints obtained from high-resolution Nuclear Magnetic Resonance spectroscopy.  This took 2.5 years! Multiplied by 200M proteins, it would take quite some effort to map the universe of proteins. The task has now been reduced from years to hours!

Family of 17 solution structures showing the backbone atoms of serine proteinase human stefin A. The protein has a well-defined global fold consisting of five anti-parallel β-strands wrapped around a central five-turn α-helix. There are two flexible regions in this structure which are two of the components of the “tripartite wedge” that docks into the active site of the target proteinase. These regions, which are shown to be mobile in solution, are the five N-terminal residues and the second binding loop. In the bound conformation they form a turn and a short helix, respectively.