DeepMind, the London-based artificial intelligence lab Google acquired in 2014, has created programs that always win at the most complex board games of chess, shogi and go. But the company’s ultimate goal isn’t a hobby; it’s to solve pressing scientific problems. Its AlphaFold algorithm was released in Cancun, Mexico, in early December, winning a global competition to predict the three-dimensional structure of proteins.
Google’s artificial intelligence
Proteins are the molecular machines of living things. Each one is a long chain of units called amino acids, like beads strung on a wire, that spontaneously folds to form complex and precise shapes. The final structure of each protein determines its function. For example, antibodies are like hooks attached to microbes. Hemoglobin has a gap to trap oxygen molecules. Collagen is like a braided cable.
Predicting the structure of any protein from its amino acid sequence is considered one of the holy grails of biology. This is no small task; amino acids are 20 molecules with slightly different chemical properties that are connected by bonds of different lengths and angles. It would take more than the age of the universe to bend all possible conformations of a protein before stumbling upon the correct 3D structure.
Despite the esoteric nature of this field of science, it is difficult to overstate its importance. Certain diseases, such as Alzheimer’s disease, Parkinson’s disease, diabetes or cystic fibrosis, are caused by the accumulation of misfolded proteins, which can be avoided by understanding the relationship between their sequence and structure. Almost all drugs act by coupling to specific regions of a protein, a process that in turn depends on the precise structure of the target. In addition, with the ability to accurately predict how amino acid chains will bend, scientists can design artificial proteins to, for example, degrade plastics or polluting compounds, organisms or the environment.
In a statement, the DeepMind team called the achievement the “first major milestone” in applying artificial intelligence to scientific progress. “The problem of protein folding is not yet solved,” warned Paul Bates, an expert in the field at the Francis Crick Institute in the United Kingdom, who attended the AlphaFold talk in Cancun. The DeepMind program hits more times and more accurately than other programs, but it doesn’t solve all structures. That’s because the AI learns from a database of known proteins and therefore encounters entirely new structures.
The competition AlphaFold won, called Critical Assessment of Structure Prediction (CASP), is held every two years. In it, each team receives new gene sequences every few days. These correspond to proteins that have been fully studied in the laboratory, but whose structures are not yet known to the public. Participants are expected to use their predictive models to get as close as possible to the actual form of the molecule.
According to The Guardian, the Google team, which entered the competition for the first time and was the first of 98 entrants, estimated the structures of 25 of the 43 proteins more precisely. For each amino acid sequence, there is usually a correct fold, which corresponds to a conformation of greater biochemical stability. In the laboratory, the actual form of a biomolecule can be observed using techniques such as magnetic resonance or X-ray crystallography, methods similar to those that allowed Rosalind Franklin to see the double helix structure of DNA for the first time.
Artificial intelligence is an incredible advance over these complex and expensive techniques, although it cannot yet fully replace them. DeepMind trained a neural network to link the shapes and genetic sequences of thousands of known proteins. With this knowledge, the AlphaFold program can predict the distance and angle between each pair of amino acids in the chain and then make small adjustments to the entire structure to find the most stable conformation.
The most immediate medical application of the technology will be in drug design, including anti-cancer drugs, Bates said. “We still don’t have a precise enough model to solve this problem,” he said. In the future, something more distant may be the modification of proteins associated with degenerative diseases such as Alzheimer’s. “You can start thinking about the more difficult problems. This gives a starting point.”
The DeepMind participants concluded that “there’s still a lot of work to do before we can have a quantifiable impact on disease treatment, environmental management and other applications,” but added that “the potential is huge.” According to Bates, the ideal algorithm would test every link in the protein chain without external references, but this would require a deep understanding of unat