back to article Amazing peer-reviewed AI bots that predict premature births were too good to be true: Flawed testing bumped accuracy from 50% to 90%+

A surprising number of peer-reviewed premature-birth-predicting machine-learning systems are nowhere near as accurate as first thought, according to a new study. Gilles Vandewiele, a PhD student at Ghent University in Belgium, and his colleagues discovered the shortcomings while investigating how well artificial intelligence …

  1. Oliver Mayes

    50% success rate?

    So it's as accurate as flipping a coin, or am I misunderstanding the numbers?

    1. katrinab Silver badge
      Meh

      Re: 50% success rate?

      If you always answer that it won't be premature, you will be right 88.6% of the time. So I'm not sure how they measure 50% accuracy.

      1. This post has been deleted by its author

      2. ibmalone Silver badge

        Re: 50% success rate?

        You predict premature in cases where it's not and can drop your accuracy further. Flipping a coin will always get you 50% accuracy in the long run (kind of a special case, anything other than a 50:50 prediction and the resulting accuracy depends on the population balance, but 50:50 has the same accuracy rate for positives and negatives independently).

        Of course the extreme case is predict the wrong answer each time, for 0% accuracy. That's generally not the baseline because you assume that this means you simply had your outcomes the wrong way around to start with, but it could occur by chance (extremely unlikely in all but the smallest datasets, unless you really have trained to some signal that's magically inverted in the testing data).

        1. DCFusor Silver badge

          Re: 50% success rate?

          Bayes, people. 50% is much better than a coin flip if the priors aren't ...

      3. Anonymous Coward
        Anonymous Coward

        Re: 50% success rate?

        Most likely reporting (something like) ROC.

        https://en.wikipedia.org/wiki/Receiver_operating_characteristic

  2. Il'Geller

    The basis of neural networks is Frank Rosenblatt's Perceptron, which have never worked before because they were created using n-gram parsing. However the use of AI parsing led to the fact that neuronal networks began to work.

    1. Mage Silver badge
      Facepalm

      Perceptron

      It's not real AI, also Neural Networks are nothing to do with how real brains work, it's really nothing more than a misleading label.

      May contain nuts

  3. Mage Silver badge
    Big Brother

    Breast Cancer scans

    Should we believe Google's claims to do it better than humans?

    Also ALL so called AI is really pattern matching driven by a HUMAN curated database.

    Even if as good as experts, how do you train experts to have human curated data in the future as things change or arise, if experts are replaced by these systems?

    1. ibmalone Silver badge

      Re: Breast Cancer scans

      That is certainly a limit if you are training computers to do things that people are doing and comparing to the human output (matching a hand-drawn segmentation or a radiologists classification), but you can also train to predict final outcomes (as in this case actually). Which is sort of what those human experts would be doing to start with in many cases.

      Potentially a computer can learn from a much bigger dataset than a human could ever hope to in their lifetime (or, at a lower threshold, be able to fully absorb and distinguish), there's no philosophical reason they can't do some tasks better, and pattern recognition is a good candidate. On the other hand, they also seem to need a larger quantity of task-specific data, humans are still better at generalizing small numbers of examples using their existing knowledge.

      1. Anonymous Coward
        Anonymous Coward

        Re: Breast Cancer scans

        I think the key issue in so many of these case is 'potentially'. You can train a computer very easily to repeat a task like pattern recognition - but if you don't have a good conceptual model of what the pattern is and how the computer is spotting it, the chances are you are just painting the bullseye on the side of the barn after you fired the arrow.

  4. Anonymous Coward
    Anonymous Coward

    Just because something is peer reviewed it doesn't mean to say it works and results reproducible, as Bayer research scientists found....

    http://blogs.nature.com/news/2011/09/reliability_of_new_drug_target.html

    also covered by Reuters....

    https://www.reuters.com/article/us-science-cancer/in-cancer-science-many-discoveries-dont-hold-up-idUSBRE82R12P20120328

    1. Cuddles Silver badge

      "Just because something is peer reviewed it doesn't mean to say it works and results reproducible"

      Indeed. However, ideally peer review at least acts as a decent filter to stop obviously shoddy work getting put out in the first place. Scientific publications get a lot of flack for not being reproducible or not giving significant enough results, but that's not actually a problem most of the time because the whole point is to put the results out there and let other people look into it and try to reproduce them, extend the work, or whatever.

      In this case though, it appears to be sheer incompetence on the part of both the reviews and editorial staff. This is exactly the sort of thing peer review is there to catch - obviously unbelievable results caused by poor experimental method. Either the papers clearly describe their own failings and shouldn't even make it to review in the first place, or they don't describe the work in enough detail to justify publication. Many papers don't hold up given time to do reproductions or further related work, but it's rare to see such clear sloppiness through the entire experimental and publication process.

    2. Schultz Silver badge
      Boffin

      Why most published research is wrong

      Here is the actual paper with that title: https://www.google.com/url?sa=t&source=web&rct=j&url=https://journals.plos.org/plosmedicine/article%3Fid%3D10.1371/journal.pmed.0020124&ved=2ahUKEwjp4puP353nAhVGUd4KHSoaD8IQFjAAegQIARAB&usg=AOvVaw3ej46EjYOkYi2cVzeTN8z-

      Unsurprisingly, this kind of problem is most prevalent in fields where scientists fight about big pots of money. Say, medically relevant research.

      In my field, scientists tend to measure hard numbers - - no big financial incentives and wrong results will be falsified (eventually) ruining your reputation. But in fast-moving fashionable and well funded fields (AI, etc.) the incentives are all wrong.

  5. Alan Johnson

    Concept flawed

    The idea of training and evaluating a system with only 38 examples as anything except a rough proof of concept to help decide if it is worthwhile to investigate further is illconceived from the start.

    1. mj.jam

      Re: Concept flawed

      The total of 300 feels like just about enough data for that. Although it depends if they need to do any hyperparameter searching where they would need a third data set. The problem they will run into is that they may keep tweaking their model and using it on the test data set and finding one that works. Or keep repartioning their data until they get something. Everything is likely to be highly over-fitted since all they have done is add a manual phase into this.

  6. David 135

    The sample was not necessarily too small to train some systems, but many would be liable to over-fit the training data and not generalise. The big problem was ensuring the testing was right. Unfortunately this isn't even uncommon. When I was doing my Masters degree I was reviewing the available literature on predicting foreign currency movements and an outright majority of the published papers I found using Machine Learning to predict foreign currency movements made elementary mistakes in their testing procedure not dissimilar to this, leading to unbelievably high prediction scores. I really hoped to find one had a suitable process to reliably predict next-day currency movements, but unsurprisingly that virtual unlimited pot of gold was not real and my final paper primarily served to debunk a dozen or so other papers by showing that their results were not reproducible and to explain the identifiable flaws in their processes.

    The simple lesson is that too many people don't understand the necessary processes for doing machine learning properly - including many academics writing papers about it.

    1. Claptrap314 Silver badge

      It's called "motivated reasoning". These "academics" needs publish or perish, and a null result for your efforts doesn't read very well.

      1. Anonymous Coward
        Anonymous Coward

        It's worse than just "publish or perish". Under a lot of models used for promotion and tenure, there's no disincentive for producing bad studies at all. A lot of models simply look at the number of papers published and perhaps how prestigious the outlet is. Models rarely take into account any type of paper quality; when they do, it's usually in the form of the number of citations. In that case, having your paper debunked still increases your ratings! There might be one that takes into account the number of papers that you've had retracted, but it's certainly not popular.

  7. This post has been deleted by its author

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020