DeepMinds AlphaFold Will Solve Most Protein Structures by End of 2021

A transformative artificial intelligence (AI) tool called AlphaFold, which has been developed by Google’s sister company DeepMind in London, has predicted the structure of nearly the entire human proteome (the full complement of proteins expressed by an organism). In addition, the tool has predicted almost complete proteomes for various other organisms, ranging from mice and maize (corn) to the malaria parasite.

The more than 350,000 protein structures, which are available through a public database, vary in their accuracy. But researchers say the resource — which is set to grow to 130 million structures by the end of the year — has the potential to revolutionize the life sciences.

Biggest Science Contribution Made by AI

“It’s totally transformative from my perspective. Having the shapes of all these proteins really gives you insight into their mechanisms,” says Christine Orengo, a computational biologist at University College London (UCL).

“This is the biggest contribution an AI system has made so far to advancing scientific knowledge. I don’t think it’s a stretch to say that,” says Demis Hassabis, co-founder and chief executive of DeepMind.

But researchers emphasize that the data dump is a beginning, not an end. They will want to validate the predictions and, more importantly, apply them to experiments that were hitherto impossible. “It’s an amazing first step, that we have all this data on that scale,” says David Jones, a UCL computational biologist who advised DeepMind on an earlier iteration of AlphaFold.

Deluge of data
The approximately 365,000 structure predictions deposited this week should swell to 130 million — nearly half of all known proteins — by the year’s end, says Sameer Velankar, a structural bioinformatician at EMBL-EBI. The database will be updated as new proteins are identified and predictions improved. “This is not a resource you expect to have access to,” says Tunyasuvunakool, and she is eager to see what scientists come up with.

Researchers are already using AlphaFold and related tools to help make sense out of experimental data generated using X-ray crystallography and cryo-electron microscopy. Marcelo Sousa, a biochemist at the University of Colorado Boulder, used AlphaFold to make models from X-ray data of proteins that bacteria use to evade an antibiotic called colistin. The parts of the experimental model that differed from the AlphaFold prediction were typically regions that the software had assigned with low confidence, Sousa notes, a sign that AlphaFold is accurately predicting its limits.

Written by Brian Wang,

38 thoughts on “DeepMinds AlphaFold Will Solve Most Protein Structures by End of 2021”

  1. Things will get interesting when they push it towards proteins that are not present in nature but possible. Beyond simple stuff like left handed versions of aminos.

  2. To say that you can reconstruct an animal from a detailed list and description of all of its proteins is somewhat like saying you can reconstruct all the food in a five star restaurant from knowing what ingredients they stock.

  3. Well. None of these structures contain water. The entire protein folding field including x-ray crystallography is an academic exercise because we don't understand hydration. So save your oohs and ahs. No one can even compute a snowflake which is just composed of water molecules. When I mentioned this to a world class computational physicist, he frowned and said "water is special". Well yes.

  4. "determining a protein’s 3D shape from its amino-acid sequence. . . . In some cases, AlphaFold’s structure predictions were indistinguishable from those determined using ‘gold standard’ experimental methods such as X-ray crystallography and, in recent years, cryo-electron microscopy (cryo-EM)."

    They didn't just take an amino acid sequence and see how it folded. They had a target design, and they showed they can predict that from an amino acid base. The target design could be anything – a natural protein for medicine, or some desired molecular structure for, say, nanomachine purposes.

  5. I agree that, even if they make nanotech with proteins, it's soft nanotech. But, seems to me they can make nano-machines with this software.

    The protein folding software from both Alphafold and the Institute of Protein Design, uses a target design, and the software accurately shows how amino acids fold up into the "desired" shape.

  6. I mean the first time I saw Eric Drexler's book, I had thought of space exploration as advanced technology. So, the backside and the Marvin Minsky talking about revolutionary technology didn't make sense to me.

  7. Very cool that you have a signed copy of Eric Drexler's "Engines of Creation." My original copy has pages falling out. i found a new copy in my local library bookstore. I got it for like a dime or something like that!

    I can't help telling my story of discovering Eric Drexler's "Engines of Creation." I use to hike like four miles to the local shopping store center. This was in Poway, San Diego. There was a small bookstore in there. It wasn't a chainstore bookstore. This was before Barnes and Nobles. Bookstores were small then.

    This small bookstore's science section was of course small. It was one stack, with one shelf of science books. Most of the bookstack of science was nature books – like butterflies and birds I don't remember any interesting Astronomy much less Physics books. There was a Rudy Rucker's "Mind Tools" which I actually picked up before checking out a book with the curious title of "Engines of Creation."

    This was in 1988. I know I put the book down wondering what cells have to do with making the space future happen? I went on to read Alvin Toffler's books – FutureShock, the Third Wave, and Powershift. And, Astronomy books and some space explorations books. I came back two years later, and this time something told me to pick this up.

    I'm tempted to say something about Alvin Toffler's talking about the information age maybe made me buy the book. And the rest, as they say, is history!

  8. You do realize that plants tend to moderate the activity of genes by having more or fewer copies of them, right? So the total amount of genetic code in a plant cell is hardly a good guide to the complexity of the plant.

    It's quite common for plants to just double their number of chromosomes and be perfectly healthy, because they're simple enough that doesn't majorly screw things up.

  9. IIRC, ab initio protein design is actually a bit easier than predicting the shape of natural proteins. Because you can use only highly predictable sequences in your design, like stacking a brick wall instead of predicting the shape of an undressed stone wall from the order the rocks were stacked. Evolution doesn't select for predictable proteins, just ones that work.

  10. My copy of Engines of Creation is autographed, because I was part of the mass buy to get Drexler a larger initial print run. So I might have heard something about it.

    Yes, this software might help with achieving wet nanotech. Of course, some argue "life" already provides us with that.

  11. The issue we're dancing around here is the reversibility of these calculations.

    SOme calculations are trivially reversible. Calculating 2×3=6 is not significantly different from calculating 6/3=2.
    Some calculations are NOT. That's the whole principal of prime factorisation in cryptography. It's much, much harder to calculate prime factors than to multiply the factors together and get the result.

    So, given that they appear to have solved the protein chain to protein structure calculation, the question remains: Can they do desired structure to protein chain required to produce it?

    THen you do DNA required to produce said protein chain (which IS a solved problem AFAIK) and now you can produce trillions of said structures for feasible cost.

    And that gives you soft, squishy, nanotech.

  12. I know that David Baker is aware and has even tried his hand at making nano-machines through protein folding. I still feel that they are dragging their feet about it; it looks like they want to solve it with one hand tied behind their back. They're doing their protein folding on a gaming system!

  13. I'm pretty sure that people were saying the same thing when it first became possible to decode genomes.

  14. Have the regular readers of a site that discusses things like molecular structures ever heard of nanotechnology in the year of our Lord 2021?

    Is that what you're asking?

  15. This is also one of those cases where it can (not guaranteed) be much faster to "check" an answer than to generate one.

    If we have a folded protein shape that MIGHT be correct, we can (in some cases, not all) feed that folded shape into predicting what something like the x-ray diffraction pattern would be. Then we stick some of the protein in an xray diffraction machine and see what pattern comes out.

    It's much easier and faster to check "is this the pattern that this predicted shape would generate?" than "what shape would generate this pattern?"

  16. Do these computed structure contain water? That’s been the elephant in the room. Proteins sit in water and we don’t have any good ideas about how to deal with that.

  17. just wait until you get a quantum machine on every desk and coffee table… classic 20th century 10101 machines – silly things.

  18. Then they perhaps should redo the parts for which the measure of confidence is low, because a shape that we are only 50% sure is the actual case isn't useful. We need 98% confidence that this particular state exists in vivo.

    Note that I understand that some proteins have multiple more-or-less stable configurations, and the point of many proteins is to cycle between those configurations. However, a configuration which may actually exist, but then again maybe not cannot be counted upon to predict its biochemical behaviour.

    I'm not dismissing this work, but saying it will "solve" protein shapes by the end of this year is a big claim, but if we do not have accuracy in our predictions then it simply cannot be said to be solved.

  19. So, what we need is something like the game program Spore. You give it your DNA (and anyone else's you want involved) tell it what kind of critter you want, say 6 feet tall, specified skin color, specified hair color, symmetrical, intelligence, perfect organs, incredible immune system, maybe a bigger heart and two livers, self-repairing DNA, a biological interface for man-machine interfaces (it probably gets trickier as we get to things that have to be coded at a lower and lower level since there may be little to nothing we can use to start these with).

    Then the quantum computers and the AIs get busy and model the whole thing in VR. Like it? Good, now hit "print." Every parent wants their children to have more than they did, and go farther, right?

  20. They have a measure of confidence for each part of the folded structure, so you know what to trust.

  21. Prediction: Couple years after this is done, someone grows a "synthetic" human in a lab, successfully, that can walk, be athletic, and learn. Or a real human.

    My obviously flawed reasoning: it feels inevitable that this leads to the above breakthroughs.

  22. Productivity growth per person per year has been at 1.5% to 2.5% since the industrial revolution started.

    All the big sustained high growth states have been catch-up. Once you reach the current best practice it's been remarkable how the limit for an 1820s cotton mill, a 1920s diesel mechanic, and a 2020s software engineer has been much the same number.

    Maybe we've hit the human advancement capability limit centuries ago?

  23. Maybe this can be used to make better medical tests. It will also allow for treatments that don't even exist yet the future is spectacular.

  24. The problem is that the predictions "vary in their accuracy," which is to say that they are (at least currently) useless. When we have a set of predictions which are consistently accurate, we can say we know the proteome.

  25. This is why I don't believe in truly exponential growth. Just S-curves that look exponential for a while then flatten themselves.

    Sooner or later and for every technological domain, we will hit a wall, either of diminishing returns or of human capability to process the data and produce more knowledge.

    In cases like this the people 'in the know' simply aren't enough to process all the gathered data and imagine all the potential uses. AIs and machines can only perform mechanical parsing of the data and make inductive/deductive inferences, but are unable to produce truly new applications, which still require human semantics and understanding of the problems.

    After we get into this technological plateau, advance is much more leisurely.

  26. hah. bring some of those crytpo-miners on board…. what was that unsubstantiated news… 10 – 20% of all new graphics chips in the last 1 – 2 years have gone to bitcoin 'production'….

  27. The shortage of people knowing this stuff will become direr than ever.

    Seems that everywhere we look, there are mountains of knowledge waiting to be discovered or known by someone, to do something with it.

    Even we are adding some mountains of our own making, with the accumulated artifacts of our technology surpassing our human ability and lifespans to digest them all.

    Seems the future people are guaranteed interesting careers, because there is so much to know and do, that the ever scarce people with desire and ability can easily become the experts, on one of the ever more varied parcels of knowledge with useful applications.

  28. Agreed. Even ol' Kepler's data set has been quantified as so overwhelming that even bringing citizen-researchers over the last 5+ years has barely been able to narrow the estimated 40×10^9 earth-size 'rockies' that could be orbiting sun-likes and reds in the milky way.

  29. Yes. but where's the AI that will sort the unfathomable reams of data so the post-docs can use it… estimates indicate that 1000s of years and more times data is produced on these big projects (exoplanet, sub-atomic, health sciences, nuclear/ energy, etc.) than can be analysed and brought into practical discussion.

  30. Humans are slightly more complicated than a fruit fly, and less complicated than a small, mostly tasteless plant. Isn't that a lovely thought. If ever aliens visit our planet, then they will address the cress first.

Comments are closed.