back to article DeepMind quits playing games with AI, ups the protein stakes with machine-learning code

Researchers at DeepMind are using AI software to study how proteins fold, with the hope that it will help scientists design new drugs more quickly. The human body produces several thousands of different proteins, each with a unique form. Although there are only 20 amino acids, these can be organised in an astronomical number …

Carpet Deal 'em
diodesign
(Written by Reg staff) Silver badge

Re: It's good somebody's doing this

That doesn't, TTBOMK, use machine learning, and instead uses the spare processor cycles on a lot of computers. Completely different project.

C.

really_adf

Re: It's good somebody's doing this

Because it's not like it's ever been done before.

Err, Folding@home is mentioned (and linked) in the article.

But on that subject, I'd be very interested in some more detailed analysis of how the approaches compare, if the task is as comparable as it sounds. There are hints in this article that it's good (winning the competition) and also bad (accuracy).

Anonymous Coward
Anonymous Coward

Re: It's good somebody's doing this

>Because it's not like it's ever been done before.

Yes it's been tried many times but how those DNA sequences end up being fully folded into working enzymes or proteins is still not fully understood, these are hugely complex processes akin to climate calculations.

Anonymous Coward
Anonymous Coward

Re: It's good somebody's doing this

I'd be very interested in some more detailed analysis of how the approaches compare,

They both suck. I am speaking this as someone who has done both Mol Biol (though mostly DNA), some AI and has a chemistry degree which has a very heavy theoretical physics slant.

Our knowledge of what is going in a solution which contains all kinds of "stuff" leaves a LOT to be desired. Our knowledge of _EXACT_ structures of some proteins as generated by crystallography is _NOT_ for their natural configurations in solution. To add insult to injury for a real protein in eukaryotes it is not 20 aminoacids. It is 20 aminoacids + a lot of glycoside tails attached in various places. Without those it does not fold correctly and this is one of the reasons why are not producing a lot of stuff in bacteria at present. While we can program bacteria to synthesize a particular protein, it ends up "wrong" due to lack of glycosilation. Where and what are those tails is in the realm of black voodoo - we can read the aminosequence from the DNA (that is what is usually done), but we often have only guesses on where the glycoside tails are because the chemical methods for their analysis are fairly imprecise.

Both approaches produce approximations.

IMO the truth is somewhere in-between. Most approaches which model using a model of forces and molecular interactions have a ridiculous time for initial convergence. It takes them ages to get a point where they are actually working on something resembling the actual structure. Compared to that the AI cruises through this phase - it is not much different from a game for it (in fact, training is similar). Once it is done with this phase it does not know what to do. No training can replace a molecular forces model. That is the fundamental reason for the low accuracy.

Spoobistle

Grail seekers

As anyone who has tried to make proteins will tell you, even when you have the right sequence of amino acids there is plenty of scope for the folding to go wrong. The energy differences between folded, misfolded and disordered proteins are small. Many proteins don't simply fold spontaneously - they need prior modification or "chaperone proteins" to get it right. There are cellular processes that modify the gene sequence before it is translated into amino acids. Etc etc.

To me this looks like the "big physics" approach: if you can't answer a question, throw a bigger machine at it. It works for some questions but I suspect that the result will be rather like previous attempts to computerise biology. It will get some good hits in the short term, then people will realise that most are actually rather similar to known cases - the "low hanging fruit" syndrome.

Meanwhile, rather like the back and forth between theoretical astrophysicists and astronomers, the biologists in the wet labs will need to collect more information on the hard cases mentioned: fibrous, disordered and membrane proteins.

There's no Holy Grail, just a lot more work.

Chemist

Re: Grail seekers

"There's no Holy Grail, just a lot more work."

It would be nice to think there was but having spent 10+ years modeling proteins I tend to agree it's hard.

Incidentally the most important property of a sequence is probably to fold quickly and cleanly. Efficient function is nice to have.

John Mangan

I'm confused.

"Template-based modelling, however, only works if there is another well-known protein that is comparable. If there isn’t, developers have to turn to free modelling to construct formations from scratch. This is where DeepMind's neural-network software comes in, and generates new structures from a given set of amino acids, and is scored on its accuracy."

If there is no template, and the protein hasn't been X-Rayed, etc. how do you score its accuracy? Did I miss something in the article?

VeganVegan

Re: I'm confused.

I believe that they assign protein(s) whose structure was determined recently, but had not yet been disclosed to anyone.

Chemist

Re: I'm confused.

"Template-based modelling, however, only works if there is another well-known protein that is comparable"

However there are many in-between cases. One of the best ones I know are the prediction of the beta-propeller structure in integrins from other beta-propellers where the sequence match is rather poor.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2018