Life at the molecular level that is.
Last week saw the breakthrough news that Google has essentially solved the protein folding problem with AlphaFold from DeepMind. I was especially interested in this since this was the area of my PhD.
Function follows structure
Proteins carry out a variety of functions from DNA replication to catalysis to structuring the cytoskeleton. Each protein is built up from a unique sequence formed from 20 different amino acids. Some 200M sequences are currently known, growing by about 30M per year. The chain of amino acids folds into a unique 3D structure. This structure determines its functionality.
Prediction: the shape of things to come
Some 170,000 protein structures have been determined to date, and DeepMind has used this dataset to create an algorithm which can predict the 3D structure of a protein based only on its sequence of amino acids, to the same level of accuracy as if actually measured using a technique such as X-ray crystallography. A reasonably sized protein might take as many as 10300 different shapes, so that’s quite a prediction!
This is relevant because understanding the 3D structure of a protein can inform its function and arguably mis-function, thereby potentially accelerating the rational design of interventions such as drugs against disease states for example. With 200M proteins in scope, the potential for scientific discovery is massive.
Now we can look to Google not only in search of pizza, but also for the elixir of life.
Determined structures
25 years ago I calculated the 3D structure of a protein essentially by hand (serine proteinase human stefin A, see below) – with a simulated annealing protocol using distance and angle constraints obtained from high-resolution Nuclear Magnetic Resonance spectroscopy. This took 2.5 years! Multiplied by 200M proteins, it would take quite some effort to map the universe of proteins. The task has now been reduced from years to hours!
