back to article Don't trust deep-learning algos to touch up medical scans: Boffins warn 'highly unstable' tech leads to bad diagnoses

Be wary of medical scans enhanced by AI algorithms: the software is prone to making tiny errors that could lead to incorrect diagnoses, a study has warned. Some scientists argue that deep-learning code could reduce the time spent conducting medical scans if the algorithms can automatically improve image quality for medics and …

  1. Whitter


    A rather long read to establish the idea that AIs aren't very good at handling types of data that were not in their training set.

    1. Warm Braw Silver badge

      Re: Hmm...

      AI isn't good at any problem in which it's required to show its working. That lack of accountability is a critical deficit in many of the applications for which it is proposed - health and justice being prime examples.

  2. Chris G Silver badge

    It always apoears to me that the most enthusiasm for AI comes from those people who are developing and selling it, the same as with anything else.

    Various forms of machine learning obviously have the potential to be of extreme value in medicine and everywhere else but these systems are only as good as the people who wrote the code and developed the teaching system.

    One of the biggest problems is that people tend to relax and let the gadget do all the work when really it should only be an aid, relying on what is essentially new and unproven tech that needs years if not decades more development should not be done where health and life are st risk.

    1. Pascal Monett Silver badge

      And now we're adding automated image modifications, even though we haven't the faintest idea of how that actually works because it's done a black box that is called "AI".

      Great idea, what could possibly go wrong with something that transforms medical images needed to make a diagnosis in ways we don't understand ?

      Oh, and if you're expecting the US Government to "make sure things are up to scratch before approval for the open market", you need to cut down on the weed, my good sir. The US Government is no longer in any state to actually do something useful or needed for its citizens.

  3. gnasher729 Silver badge

    Problems with photo copiers

    Years ago it was found that some photo copiers tried to enhance images, and in the process sometimes changed letters and digits in copied images. So they looked at a rather low quality digit 8, decided it was a rather low quality digit 3, and replaced it with a nice looking 3. It seems the same thing happens again.

    1. Anonymous Coward
      Anonymous Coward

      Re: Problems with photo copiers

      This one?

      1. roytrubshaw

        Re: Problems with photo copiers

        If I remember correctly, this was about the compression algorithm.

        They used one of the JPEG 'lossy' algorithms and when they uncompressed the scan to print it, it resulted in the changes described.

        It seems photocopiers are now just single purpose scanners and printers.

  4. Circadian

    What the fucketty-fuck?

    They are using AI systems to alter images? AI is barely capable of recognising images (actually isn’t...), and some idiots are proposing using AI to “touch up” images that peoples’ lives depend on? Adding or removing details at the whim of an algorithm that is not transparent in its operation. Those bastards really only care about the money...

    1. Anonymous Coward
      Anonymous Coward

      Re: What the fucketty-fuck?

      I made the same point about an all-singing-and-dancing medical document system I once worked on: if you provide the ability to change documents already within the system, you have effectively created a medical malpractice mechanism. It doesn't matter how much A or I is involved: the original data must be clearly accessible and any subsequent amendments obviously signposted.

    2. ibmalone Silver badge

      Re: What the fucketty-fuck?

      It's actually a very active area of research. One large application area is site harmonisation; different scanners and manufacturers produce images that look different. To make analysis easier (my area is research, but people also want to use this stuff for diagnostics) you'd want them to look the same. It's quite possible to train something like a generative adversarial model to produce synthetic images that effectively have the scanner variations regressed out while leaving the image detail. People also try to do this for imputation of missing data or remove artefacts from images (such as motion blur and ringing in MRI). Classical techniques like deconvolution can do the same thing in theory, and methods like deep neural nets effectively approximate arbitrary functions, so why not?

      A lot of work goes into validation on unseen data sets, however I do usually remain a bit sceptical. I think people often forget you can't put information back into images that has been lost or washed out by noise. You're basically smearing uncertainty around at that point, and possibly into modes that don't look like uncertainty.

      1. ibmalone Silver badge

        Re: What the fucketty-fuck?

        Ah, a thumb down. Is something I said incorrect?

        If you doubt that this is an active area of research, here's the MICCAI 2019 programme. Take a flick through the poster sessions any session you like.

        If you disagree about function approximation with neural nets, meet the universal approximation theorem.

        Maybe someone simply dislikes one of the above and would rather shoot the messenger?

        Or, if instead you think that I'm wrong to at times be sceptical of what these methods put out, then I agree that anyone who thinks humans are always a gold standard has been watering down their gold. Sometimes we're not even as good as pigeons, However a lot of the time robustness is not really considered and validations frequently quite narrow in focus. What to do when things fail, or how learning methods interact with other analyses (especially when proposed for pre-processing) is often not clear. Of course, people generally have to follow the grant money, and that's where it is right now.

    3. Cuddles Silver badge

      Re: What the fucketty-fuck?

      "They are using AI systems to alter images? AI is barely capable of recognising images (actually isn’t...), and some idiots are proposing using AI to “touch up” images that peoples’ lives depend on?"

      Doing it in a healthcare setting seems particularly stupid, but the whole idea is just bizarre no matter where it's used. The plan is to take images that need to be analysed for any small variations or anomalies, and first put them through software that will edit out any small variations or anomlies. It's just nuts. There's no such thing as "enhancing" an image. The only things you can do are either remove information that is already in it, or add extra information that is not in it. That's fine if all you want to do is mess around with things to make it subjectively more pleasing to the human eye, but it can only ever be actively harmful when it's the raw information content of the image that is of interest.

      Machine learning and image recognition may well have a place in the world, although for the most part they're not really ready for prime time just yet. But the idea of using machine learning to edit images that are then passed on to a completely different system (whether human or otherwise) to actually anaylse is just utterly insane.

    4. Paul Hovnanian Silver badge

      Re: What the fucketty-fuck?

      "They are using AI systems to alter images?"

      That made me think of the Google Deep Dream program. This takes an image or video, applies recognition to it based upon its training and searches for features that it thinks it sees. It then takes the original and modifies it slightly to make it conform to what it thought it saw. Do this iteratively and the program has quite the, umm, imagination.

      And no. I don't think I'd want my oncologist examining my x-rays with whatever that computer must have been high on..

  5. Giovani Tapini Silver badge

    Falls nicely into the area that computers and maths will remain bad at...

    In the same way that CAPCHA uses difficult images with extra lines over disordered / incomplete letters (similar to photocopier example above)

    or searching images for fire extinguishers will find a UK postbox with a shovel leaning on it and decide its also an extinguisher.

    Back in real world where image quality is another variable, never mind the subjects in the images, fog, focus, object movement, changes in materials etc... It is hard for me to imagine AI keeping up with all the variables, never mind having a quality interpretation for both detecting and interpreting them.

    I also have a concern that if AI did indeed get good enough, we effectively stop training the medical staff that teach and validate the AI results. This will create a new negative feedback loop. Particularly, as again noted above, the system cannot even show its working out either to train people, or to correct its results.

    AI has masses of potential, but in my opinion we are looking for quick wins when there shouldn't be one....

    1. Mike 16 Silver badge

      Re: Falls nicely into the area that computers and maths will remain bad at...


      I also have a concern that if AI did indeed get good enough, we effectively stop training the medical staff that teach and validate the AI results.


  6. Anonymous Coward
    Anonymous Coward

    Human vetting

    Anon for obv reasons

    Ages ago I worked on system for analysing images taken from slides of potentially cancerous cells - algorithms gave numeric output as estimate of whether it might be cancerous.

    We developed a whole lot of algorithms "manually" (configurable front end to various parameters used) for different tissue types and stains used.

    Algorithms developed in consultation with the medics as they were explaining what the key features were in the diagnosis for the images.

    Working with the medics we would tweak algorithms and parameters to get a system that gave "risk" outputs similar to what the medics would give themselves.

    This system was used, not to replace medics, but to aid them.

    For a given slide image medics would see the "risk" rating given by our system, and if their initial rating was low risk but algorithm showed high risk, they would use this as spur to re-examine in case they had missed something.

    It worked well (I'm assuming still in use, left that role) in quite a few cases medics would change their risk estimate based on a "second look" prompted by the algorithm, or consult a colleague for a second opinion, (humans not perfect, get tired, and crucially because slide can have a lot of cells and human altering magnification , deciding on which areas to look at more closely, easier for something to be missed than computer which "looks" at all areas of the slide equally

    Diagnosis from medical images is not an exact science, but can be helped. This non AI method, developing algorithms based on the features the histologists would look for themselves, might have taken longer but did have advantage of knowing why it gave the results it did, compared to "black box" of an AI. It also meant once key sets of algorithms developed, easy to create new ones for different tissue / stain combinations.

    It's accuracy was generally better than skilled medics (did struggle on the occasional real outlier slide, but to be fair so did the medics, but not quite as much ) but should emphases medics still did ALL the diagnosis, this system was just an aid (and a very useful one)

    1. MadAsHell

      Re: Human vetting

      Key here is invisible-at-first decision support. Let the humans do their assessment and then show them the machine's verdict, possibly reducing the number of false-negatives (generally a good thing, but that's debatable too, e.g. DCIS for Breast Cancer).

      Years ago I used to mark undergrad essays for Medical Faculty - strict two marker system, and second marker was not allowed to see first marker's grades. Any papers differing by more than <x> percent had to be remarked - by both IIRC. That's a sensible basis of QA.

      This isn't the first time ProcNatAcadSci (PNAS as they are now) has alerted the world to software issues affecting medical imaging data: see Even the raw data from fMRI is transformed so much before any human can see it, that there is already plenty of potential for garbage in some cases. Applying dodgy neural network AI to the first-pass images is like feeding noise into a positive-feedback loop. It works, but the sound ain't pretty.

      1. ibmalone Silver badge

        Re: Human vetting

        "This article has corrections..." worth looking up Tom Nichols's later thoughts on it and the correction they submitted to PNAS When you dig into the paper the truth is: 1. it's not really about a software error, they found one, but the real problem is about default choices for analyses and common practice. 2. actually the defaults for the package that originated the method being looked at are okay (could be better), the worst cases are with home-brew type choices and people tweaking to get a 'better' result.

        Whenever I've come across PNAS articles I've noticed a tendency to push the more headline-grabbing aspects of the work, sometimes at the cost of accuracy (not just in this particular area). This is not a criticism of any of the authors on that paper though, Nichols does excellent stuff, the method they're describing is a good one and we need these reminders about statistical techniques every so often.

  7. steviebuk Silver badge

    I like

    Robert Miles recent video. Where AI has been given tasks and the agents exploit loop holes etc :)

    "A robotic arm trained using hindsight experience replay to slide a block to a target position on a table achieves the goal by moving the table itself."

    One agent asked to generate short computers programs with input and an output. The system learned it could fine where the target output was stored and edits it. Deletes what was in the file so it was empty and then just produces its code with no output. The evaluation engine then checks, sees the original target file has no output, sees that the agent produced no output. The two match so its says "Good job".

    And the list of tasks and what the agents did :)

    So the AI could work out. "I can get a reward by just killing all my patients. If there is nothing to diagnose then I can't get it wrong". Similar to one AI on that list that would kill itself at the end of every level, so that it could then never fail the next level.

    "Agent kills itself at the end of level 1 to avoid losing in level 2"

  8. Kevin McMurtrie Silver badge

    So difficult to please!

    You teach the AI what a good picture looks like then say it's wrong when it fabricates more of those pictures.

  9. Anonymous Coward
    Anonymous Coward

    This has been going on for a while

    Anon for obvious reasons - I have worked in the analysis of the gait of children and adults with cerebral palsy for more than 30 years now, when this first started the computer systems would collect the data, calculate the trajectories and present the numbers together with a measure of the accuracy of each calculated point in space and time.

    Then the motion capture manufacturers started selling their systems to the movie industry and better looking (smoother) data is more important than accurate data. So now virtually all clinical gait data looks good but there is no measure of the accuracy any longer - we're just told that we have to trust it because it looks good.


  10. merwi

    Compressed sensing not great either

    For a long time data compression and expansion algorithms have been in use to tweak medical images, especially CT, so that "sparse data" eg from low kV studies can be used to produce what superficially seems to be an adequate or pleasing image. For the more recent graduates, this is all they are used to, and they may not be aware of the weaknesses of these techniques. The smoothing and iterative algorithms however result in homogenisation of the data, and this results in the loss of fine detail eg hairline fractures, which have become very difficult to see on CTt. Since unfortunately iterative algorithms have now become the industry standard, there is a considerable amount of pathology which is not being seen, both in bone work and in HRCT (chest images).

  11. Citizen99
    Black Helicopters

    I remember a BBC (of course) programme describing how a system of what was essentially mathematical interpolation between mineral deposits locations was successfully used to find the best place to dig for more of the said mineral.

    The argument was then extrapolated ,by (cough) inference, to justify the manufacture of data for input to climate modelling.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020