back to article Was this quake AI a little too artificial? Nature-published research accused of boosting accuracy by mixing training, testing data

An academic paper published in Nature has been criticized by a data scientist – who found a glaring schoolboy error in the study when he tried to reproduce the machine-learning research. The paper in question, published in August last year, describes how neural networks can be trained to predict the location of aftershocks …

  1. Anonymous Coward
    Anonymous Coward

    Thou shalt not question thine computer model

    Especially in 'Climb it Seance'

    1. Paul Kinsler

      Re: Thou shalt not question thine computer model ...Especially in 'Climb it Seance'

      (1) This isn't a dispute in any way related to climate science, even by computer modelling, as your "Climb it Seance" implies. Further, you should know by now that it's *Brexit* that has to be dragged into every discussion nowadays, the climate thing being somewhat passe.

      (2) I encourage reading of the Nature referee's comments on the github link. That said, I think perhaps the comment (and a reply) might well have been quite a useful addition to the literature; even if a comment might be slightly off the mark, the debate can be instructive.

      1. W.S.Gosset Bronze badge

        Re: Thou shalt not question thine computer model ...Especially in 'Climb it Seance'

        (1) well, actually, it is, in the sense that the fundamental flaw ("the model is heavily overfitting to the training data") has a direct correlative in greenhouse global warming's modelling.

        .

        (for anyone interested (zzzz...) :

        The forcing factor is the most important factor in AGW's core model, but is only an arbitrary nexus with a factor attached (then latterly-applied rationale for what it "must" mean), where that factor is quite literally the number necessary for this arbitrary factor to bring the model's numbers in line with its "training" dataset, "to recover the input numbers".

        In econometrics/statistics/modelling, this is called "tautology". Tautology by itself essentially destroys the validity of any model.

        One characteristic of a tautology-affected (aka "over-fitted") model is its inability to forecast, its poor power/accuracy outside of its "training" dataset. And this is demonstrated by the climate model's appalling forecast results.

        Amusingly, there was a conference a few years ago specifically to debate changing the forcing factor -- by fiat, not measurement! Well, if it was real, you couldn't even propose doing so without laughing. "We should change pi! It would make my results so much better if you changed pi!"

        )

  2. Anonymous Coward
    Anonymous Coward

    We are select

    "We are earthquake scientists" does not sit well with "we do not accept criticism of our data handling from a data expert."

    Nature needs to remind itself what "interdisciplinary" means when choosing peers for a review.

    1. Paul Kinsler

      Re: We are select

      I cannot find anywhere in the article, or the github content, where the phrase "we do not accept criticism of our data handling from a data expert" appears.

      Can you locate it for me? I would like to understand better the context or intent of your "does not sit well" phrasing.

  3. Peter 26

    Raj's response to authors response

    I'd like to see a response from Raj about the authors comments. Can he explain why they are wrong?

    “The network is mapping modeled stress changes to aftershocks, and this mapping will be entirely different for the example in the training data set and the example in the testing data sets, although they overlap geographically," the pair said.

    "There’s no information in the training data set that would help the network before well on the testing data set - instead, the network is being asked in the testing data set to explain the same aftershocks that it has seen in the training data set, but with a different mainshocks. If anything, this would hurt [the] performance on the testing data set,” DeVries and Meade, wrote back to Shah.

    1. DavCrav Silver badge

      Re: Raj's response to authors response

      "Can he explain why they are wrong?"

      Here's where they are wrong:

      "[...]admitted that their model was trained and tested on a subset of the same data[...]"

      And then I ignored everything they said after that as special pleading and 'but we know what we're doing, we're scientists'.

      He doesn't have to prove them wrong. They are wrong by assumption. They have to prove that they are right, to others' satisfaction. And someone else needs to be able to reproduce their results using the same data, which apparently cannot be done, at least easily.

    2. unimaginative
      FAIL

      Re: Raj's response to authors response

      The problem is that they fail to address his actual criticism which is not that it is learning about specific main shock's relationship with aftershocks, but that it is learning what that relationship is is specific regions so is not generalisable to other places.

      In fact his own testing shows that if you run it properly its no better than existing techniques.

      A secondary issue is that they fail to mention that a much computationally much lighter ML does as well as deep learning.

      He highlights other methodological issues too.

      They do not seem to get it even after he explains.

      1. ibmalone Silver badge

        Re: Raj's response to authors response

        Writing on a train, so responding to comments here rather than the paper and letters themselves, but you'd think learning the aftershock pattern for a specific region is still a useful thing to be able to do. Possibly the authors are saying that this is an aim of the paper that is being missed?

        It's not that uncommon for training/validation/test sets to be created by partitioning a larger data set. Ideally you also test on a completely independent testing set, but in a lot of contexts it's rare for it to have nothing in common with the original data.

  4. Anonymous Coward
    Anonymous Coward

    It's a great example of silos in science. The earthquake scientists don't seem to care that they got their data science wrong, and apparently it doesn't affect what they were trying to say. The data scientists apparently don't know enough earthquake science to be able to understand the main aim of the paper, but they do understand the data science and that's good enough for their comments to be valid. The result is a flawed paper that can't stand deep scrutiny by truly independent reviewers. The Nature referee seems to think flawed science is OK as long as none of the likely readers are going to be bothered by it. I wonder what their background is? There are lots of slightly similar example papers in biological fields, that rely on dodgy statistics because the authors never bothered to consult an actual statistician.

    1. Paul Johnston
      Thumb Down

      Rather apt!

      Reminds me of

      To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.

      R.A. Fisher

    2. Muscleguy Silver badge

      My daughter is in Bionformatics and is trying to get everyone to pay attention to something which matters if you are not doing human stuff and might matter then but you have to look. She is Bio and Info from the ground up, not a convert so understands the biological significance of the issue and how it impacts the data analysis.

      The problem is the computer side don't see the biological significance and the Biologists don't rate the bioinfo implications. Oh and they don't want to have re-analyse all their data but this time doing it properly.

      The sex of the person making the point is also, sadly, an issue. She has however run the maths and has the proof worked out.

      1. W.S.Gosset Bronze badge

        very common

        She may or may not be relieved to know this is an extremely common sort of problem with research.

        To paraphrase Bismarck: Research papers are like Sausages, 99% of them you can't swallow after you've seen how they were made.

        .

        A hobby of mine is pulling the original-research any time any loud announcement is made which has the implication of people "needing to change". Eg, meat vs bowel cancer, destroy your energy infrastructure, plastic bags kill turtles, etc. So far, every single "research" paper(s) has fallen to pieces on close inspection. Disturbingly often, in fact, their own results flatly contradict what their Conclusion/Summary says they are! You don't even need to crawl the methodology; merely observe the 2 radically different messages in the one paper.

        One exception: the old ozone-layer stuff. That was solidly performed, including itself pointing out where it was weak.

  5. Grooke

    Funny, second time this month that I read about Nature refusing to publish anything that might indicate a previous article was flawed.

    https://slate.com/technology/2019/06/science-replication-conservatives-liberals-reacting-to-threats.html

    It's a shame because it really puts the trust in peer-review in jeopardy.

    1. Paul Kinsler

      "second time this month"

      Just on a point of information the journal "Science" (as referred to in your link) is not the journal "Nature"; they even have different publishers. Or are you referring to some other "second" Nature article?

    2. W.S.Gosset Bronze badge

      > puts the trust in peer-review in jeopardy

      I'm afraid that went out the window quite a while back. The revelations over the last decade+ of how badly it's being hijacked procedurally (and often also socially) mean that it is no longer any real indicator of solidity/quality/reasonableness, merely of conformance to (local) paradigm. Google "grievance studies hoax" for an indication of how little it means.

      You really do have to crawl each methodology yourself.

      And that's just what *I* have seen. I think back to the old-hands rolling their eyes re peer review 3 decades ago, and really the gamesmanship must have been starting a lot earlier.

  6. Anonymous Coward
    Anonymous Coward

    Someone should tell this to Amazon (and the cops)

    "Typically this is done with a random sample of your data – the test set – which you never expose to the model," he added. "This ensures your model has not learned from this data and provides a strong measure to ascertain generalizability."

  7. Claptrap314 Silver badge

    Death of Science

    The scientific method is defined as, "Hey, I have an idea! I couldn't prove it wrong, would you try?"

    Less and less of that out there every year, it seems.

  8. Neoc

    Call me Mr Silly, but why not simply remove the suspect data from the result set and see how the model's prediction rate is affected? Or just re-run a new set of data that doesn't contain the suspect data-set? The fact that this was not even considered makes me look askance at the original paper. After all, one of the requirements of a scientific experiment is that it should be reproducible (and preferably modifiable).

  9. Pat Att

    Nature losing its crown?

    What with its take on gender science and now this, Nature is in danger of losing its reputation as the most prestigious journal.

    1. W.S.Gosset Bronze badge

      Re: Nature losing its crown?

      Heh, that reminds me. At the recent inter-Nation conflab re All Things Major, Australia was formally asked to assure all nations that it was correctly incorporating gender science in, and in fact guiding, its climate change research and actions.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019