back to article Time to rethink machine learning: The big data gobble is OFF the menu

Machine learning (ML) may well be The Next Big Thing™, but it has yet to register in mainstream enterprise adoption. While breathless prognosticators proclaim 50 per cent of organisations lining up to magically transform themselves in 2017 with ML, more canny observers put the number closer to 15 per cent. And that's being …

  1. Ian Bush

    I mentally replaced ML with HPC throughout and the articles was still close to coherent and relevant. OMO the ultimate issue here is valuing programmers (both in industry and academia) with relevant specialist skills and rewarding them - that would encourage others to follow down their path.

    1. Nolveys
      Childcatcher

      valuing programmers...and rewarding them

      I understand the words individually but together they make no sense. It works if you change "programmers" to "sales people". Maybe it's like last year when we gave the programmers coupons for 25% off of cans of bondo for their late 90's vehicles in lieu of Christmas bonuses?

      Of course we did the bondo thing because the sales people were complaining about all the ugly vehicles with holes in them. We later came up with a better solution to the problem by renting a dirt lot on the other side of the highway for the programmers to park in. I think that exercise is important and having the programmers cross the dirt lot, a six-lane highway, our parking lot and our very impressive lobby (we got marble floors in there last year, by the way) improves output. We have had a few programmers hit by cars, but they probably weren't team players.

      Now when clients come to our offices they only see brand-new BMWs and Audis parked in the front lot. Projecting a successful image is the most important thing, after all.

      Well, I've got to go. I'm implementing a new program whereby we can monitor how much off-hours time programmers spend sleeping in their cubicles so we can charge rent.

      - Dylan "Booya" Skilling, CIO

      1. Mephistro
        Coffee/keyboard

        @ Nolveys

        RFLMAO++

    2. thx1138v2

      What programmers?

      Why not replace the programmers with AI/ML? Then you can have all you want with no need to "value" them. I admit that might take a bit of marketing but that's the name of the game really - not computing. [/sarc]

  2. Michael H.F. Wilkinson Silver badge

    It is not the amounts of data that matter, it is the labelling

    Copious amounts of data are easy to get, rather harder to turn into information. In order to train most AI or ML systems you need copious data with a reliable ground truth. The latter is very, very hard to come by, and requires lots of very, very careful, and usually dull work in labelling data items as belonging to different classes. If your ground truth on which you train you method is suspect, you will end up with over-fitting problems, because the ML/AI method with faithfully try to reproduce erroneous human decisions. For deep learning methods like convolutional neural networks (CNNs) to yield their (often impressive) results, you need hundreds of thousands, or preferably millions of accurately labelled data items. CNNs have been around fr quite a long time, but only the advent of large, labelled databases of images and the like made the methodology take off. labelling hundreds of thousands of data items automatically would be ideal, but isn't always possible. Usually some poor sods has to do lots and lots of unglamorous work.

    Apart from these problems (which are daunting enough), there is the problem of all the parameter choices (learning rates, numbers and type of layers in deep networks, etc) to get right

    1. Andrew Commons

      Re: It is not the amounts of data that matter, it is the labelling

      Exactly. It is very easy to get many GBs of security logs but labelling them is a huge issue. So relatively ancient KDD Cup data is used over and over because it's among the few instances of tagged data available.

    2. Charlie Clark Silver badge
      Thumb Up

      Re: It is not the amounts of data that matter, it is the labelling

      One upvote isn't enough!

      Apart from that ML is going to be pervasive but just not in the way that suits vendors and crystal ball readers like Gartner.

      1. Anonymous Coward
        Anonymous Coward

        Re: It is not the amounts of data that matter, it is the labelling

        It's not just labelling, its accuracy and reliability.

        Most data sets are inaccurate - sensors will not always return results matching a similar sensor next to it. People dont always bother to say they have moved, married, lost an arm (great for bio-metrics) etc.

        Assuming asking questions of the data will get a reasonable result is the first hurdle not mentioned to the execs getting sold "solutions"

        if you can ever get as far as trusting the data, can you trust the questions? Organisations are highly likely to start customers prefer x over y just because of differences in behaviour or measures collected.

        The results become self fulfilling, or entirely random.

        Engineers have been working this out for ages in the science of climate for example to normalise data - its hard, never mind attempting predictions of it.

        As far as I am concerned most of the "solutions" are snake oil and should be treated as a big data dump.

        Its hard enough just getting sense out of my own systems management data which I have a broad familiarity with - its perfectly normal to advise root causes that are just another symptom - but at least I have some chance of recognising that...

  3. Detective Emil

    I get this feeling of déjà vu …

    Google Ngrams shows that Artificial Intelligence enjoyed a vogue in the late eighties, neural networks in the mid nineties. The graphs don't show mentions of Machine Learning rising much, because Google hasn't added to the Big data behind them since 2008. But, if it does, and if I run this query again in 2010, I'm sure the pattern will repeat.

    1. thx1138v2

      Re: I get this feeling of déjà vu …

      Of course if you can run it again in 2010 with your time machine you'll have no need for the results assuming your time machine also works properly in forward mode.

  4. Destroy All Monsters Silver badge

    Suddenly sanityland

    Unfortunately, we can't stop here.

  5. Anonymous Coward
    Anonymous Coward

    Asay is back... with some new buzzwords to sell you.

    But I prefer to buy from Steve Bong, he's much more reliable.

  6. Doctor Syntax Silver badge

    Objectives?

    One thing these articles seems to lack is exactly what all this unicorn dust is going to do for businesses that can't be done better and cheaper by other approaches.

    I have this vision that, after spending multimillion currency units on analysing multi-petabytes of data, some data scientist rushes into marketing to announce "We can sell more ice-cream in hot weather.".

    1. Anonymous Coward
      Anonymous Coward

      Re: Objectives?

      As the article points out:

      >Beyer agrees, acknowledging his own "dirty secret". "So many [so-called ML] problems could be solved by just applying simple regression analysis."

      Which is especially telling, considering some ML algorithms are just complex regression analysis.

    2. thx1138v2

      Re: Objectives?

      More ice cream in hot weather is ALWAYS a winner. I'll sign up for that.

    3. Mephistro
      Coat

      Re: Objectives?

      "We can sell more ice-cream in hot weather."

      "... and following advice from our ML systems, we're going to double production and send half of our ice creams to the Atacama desert."

  7. ntevanza

    What's your data maturity level?

    Do you know what data you have?

    Can you point to where your data are and who's reponsible for them?

    Do you retire data?

    Do you know what data you don't have?

    Are your data available, reliable and compliant for the people who need it, assuming you know who they are?

    Does your org understand the difference between privacy and confidentiality?

    Does your management know what it means to pose an empirical question?

    Is your management reasonably free from all of the following biases: survivor, confirmation, belief, recentness, egotism?

    Does your org understand the difference between a marketing trend and an empirical finding?

    Is your org capable of changing its mind in the face of emprical evidence?

    Does your org tolerate short term failure and uncertainty?

    If you answer no to any of these questions, it's best to learn to walk first.

    1. thx1138v2

      Re: What's your data maturity level?

      AI/ML can answer all of those questions, silly. I'm too busy selecting my lottery numbers at the moment... Hey, how about...

  8. Anonymous Coward
    Anonymous Coward

    @Matt Assay

    Dude,

    You clearly don't know anything about Big Data.

    Its not hard. Its pretty simple and straight forward.

    The issue though is that many who claim to know 'big data' really don't and were never taught how to think in terms of big data and data processing. Its a lack of education.

    ML and AI are more complex, yet the same problem continues because people who are jumping on the bandwagon are doing so because they are following the money and hype.

    I get paid to fix problems that others create.

    Concepts that were addressed over five years ago are still not well known today.

    The problem isn't how hard, but that those who are asked to do the work, don't have the requisite education and training.

    Too much hype in the market and everyone is trying to jump on the next big thing and get in early while they still don't know how to do the work needed to get done today.

    Posted Anon, for the obvious reasons.

  9. Scott Broukell
    Meh

    My thoughts are pretty much the same as many of the above. In other words one should very much heed the old axiom Garbage In = Garbage Out. Call me a cynic, but I feel the push of hardware vendors to sell ever faster, more expensive, boxen on which to run the new shiny shiny, must-have, ML / AI. An increase in the power consumption doesn't automatically get the promised results.

    Sadly, there will come a time when that old axiom is long forgotten and no matter what output the machine gives it will be taken as shiny shiny silicon truth and as such it will trounce the individual human bean and rise to supreme dominance. <rant over = goes back to building off-grid shack in the woods>

    1. thx1138v2

      Indexing

      I don't remember which one at this point but there was a programming manual with an index that contained the following entries:

      Endless loop - See loop, endless

      Loop, endless - See endless loop

      Add that to ->garbage in ~= garbage out ->

      Ahh, fractal insanity.

      DON'T, whatever you do, let that manual into your shack.

  10. Anonymous Coward
    Anonymous Coward

    These data-driven product people: They frequently have $ENORMOLISTOSKILLS

    hmm, "jack of all trades, master of none", does that apply here? Sounds like they should be looking for /teams/, not unicorns.

  11. Anonymous Coward
    Anonymous Coward

    Cute robot!

    Where can I get one?

    https://regmedia.co.uk/2017/07/04/tiny_robot_photo_via_shutterstock.jpg

    1. Anonymous Coward
      Anonymous Coward

      Re: Cute robot!

      It kinda looks like the ones from "Batteries not included".

  12. FrankH99

    PC Nonsense

    "...the slightest bit of real research reveals that ML is very hard and not something the average Python engineer is going to spin up in her spare time."

    Now I'm no expert, but I suspect that the average Python engineer would struggle to spin it up in HIS spare time.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like