I mentally replaced ML with HPC throughout and the articles was still close to coherent and relevant. OMO the ultimate issue here is valuing programmers (both in industry and academia) with relevant specialist skills and rewarding them - that would encourage others to follow down their path.
Time to rethink machine learning: The big data gobble is OFF the menu
Machine learning (ML) may well be The Next Big Thing™, but it has yet to register in mainstream enterprise adoption. While breathless prognosticators proclaim 50 per cent of organisations lining up to magically transform themselves in 2017 with ML, more canny observers put the number closer to 15 per cent. And that's being …
COMMENTS
-
-
Wednesday 5th July 2017 14:40 GMT Nolveys
valuing programmers...and rewarding them
I understand the words individually but together they make no sense. It works if you change "programmers" to "sales people". Maybe it's like last year when we gave the programmers coupons for 25% off of cans of bondo for their late 90's vehicles in lieu of Christmas bonuses?
Of course we did the bondo thing because the sales people were complaining about all the ugly vehicles with holes in them. We later came up with a better solution to the problem by renting a dirt lot on the other side of the highway for the programmers to park in. I think that exercise is important and having the programmers cross the dirt lot, a six-lane highway, our parking lot and our very impressive lobby (we got marble floors in there last year, by the way) improves output. We have had a few programmers hit by cars, but they probably weren't team players.
Now when clients come to our offices they only see brand-new BMWs and Audis parked in the front lot. Projecting a successful image is the most important thing, after all.
Well, I've got to go. I'm implementing a new program whereby we can monitor how much off-hours time programmers spend sleeping in their cubicles so we can charge rent.
- Dylan "Booya" Skilling, CIO
-
-
Wednesday 5th July 2017 09:30 GMT Michael H.F. Wilkinson
It is not the amounts of data that matter, it is the labelling
Copious amounts of data are easy to get, rather harder to turn into information. In order to train most AI or ML systems you need copious data with a reliable ground truth. The latter is very, very hard to come by, and requires lots of very, very careful, and usually dull work in labelling data items as belonging to different classes. If your ground truth on which you train you method is suspect, you will end up with over-fitting problems, because the ML/AI method with faithfully try to reproduce erroneous human decisions. For deep learning methods like convolutional neural networks (CNNs) to yield their (often impressive) results, you need hundreds of thousands, or preferably millions of accurately labelled data items. CNNs have been around fr quite a long time, but only the advent of large, labelled databases of images and the like made the methodology take off. labelling hundreds of thousands of data items automatically would be ideal, but isn't always possible. Usually some poor sods has to do lots and lots of unglamorous work.
Apart from these problems (which are daunting enough), there is the problem of all the parameter choices (learning rates, numbers and type of layers in deep networks, etc) to get right
-
-
Wednesday 5th July 2017 11:51 GMT Anonymous Coward
Re: It is not the amounts of data that matter, it is the labelling
It's not just labelling, its accuracy and reliability.
Most data sets are inaccurate - sensors will not always return results matching a similar sensor next to it. People dont always bother to say they have moved, married, lost an arm (great for bio-metrics) etc.
Assuming asking questions of the data will get a reasonable result is the first hurdle not mentioned to the execs getting sold "solutions"
if you can ever get as far as trusting the data, can you trust the questions? Organisations are highly likely to start customers prefer x over y just because of differences in behaviour or measures collected.
The results become self fulfilling, or entirely random.
Engineers have been working this out for ages in the science of climate for example to normalise data - its hard, never mind attempting predictions of it.
As far as I am concerned most of the "solutions" are snake oil and should be treated as a big data dump.
Its hard enough just getting sense out of my own systems management data which I have a broad familiarity with - its perfectly normal to advise root causes that are just another symptom - but at least I have some chance of recognising that...
-
-
Wednesday 5th July 2017 09:34 GMT Detective Emil
I get this feeling of déjà vu …
Google Ngrams shows that Artificial Intelligence enjoyed a vogue in the late eighties, neural networks in the mid nineties. The graphs don't show mentions of Machine Learning rising much, because Google hasn't added to the Big data behind them since 2008. But, if it does, and if I run this query again in 2010, I'm sure the pattern will repeat.
-
Wednesday 5th July 2017 10:29 GMT Doctor Syntax
Objectives?
One thing these articles seems to lack is exactly what all this unicorn dust is going to do for businesses that can't be done better and cheaper by other approaches.
I have this vision that, after spending multimillion currency units on analysing multi-petabytes of data, some data scientist rushes into marketing to announce "We can sell more ice-cream in hot weather.".
-
Wednesday 5th July 2017 11:12 GMT ntevanza
What's your data maturity level?
Do you know what data you have?
Can you point to where your data are and who's reponsible for them?
Do you retire data?
Do you know what data you don't have?
Are your data available, reliable and compliant for the people who need it, assuming you know who they are?
Does your org understand the difference between privacy and confidentiality?
Does your management know what it means to pose an empirical question?
Is your management reasonably free from all of the following biases: survivor, confirmation, belief, recentness, egotism?
Does your org understand the difference between a marketing trend and an empirical finding?
Is your org capable of changing its mind in the face of emprical evidence?
Does your org tolerate short term failure and uncertainty?
If you answer no to any of these questions, it's best to learn to walk first.
-
Wednesday 5th July 2017 12:25 GMT Anonymous Coward
@Matt Assay
Dude,
You clearly don't know anything about Big Data.
Its not hard. Its pretty simple and straight forward.
The issue though is that many who claim to know 'big data' really don't and were never taught how to think in terms of big data and data processing. Its a lack of education.
ML and AI are more complex, yet the same problem continues because people who are jumping on the bandwagon are doing so because they are following the money and hype.
I get paid to fix problems that others create.
Concepts that were addressed over five years ago are still not well known today.
The problem isn't how hard, but that those who are asked to do the work, don't have the requisite education and training.
Too much hype in the market and everyone is trying to jump on the next big thing and get in early while they still don't know how to do the work needed to get done today.
Posted Anon, for the obvious reasons.
-
Wednesday 5th July 2017 14:54 GMT Scott Broukell
My thoughts are pretty much the same as many of the above. In other words one should very much heed the old axiom Garbage In = Garbage Out. Call me a cynic, but I feel the push of hardware vendors to sell ever faster, more expensive, boxen on which to run the new shiny shiny, must-have, ML / AI. An increase in the power consumption doesn't automatically get the promised results.
Sadly, there will come a time when that old axiom is long forgotten and no matter what output the machine gives it will be taken as shiny shiny silicon truth and as such it will trounce the individual human bean and rise to supreme dominance. <rant over = goes back to building off-grid shack in the woods>
-
Wednesday 5th July 2017 16:42 GMT thx1138v2
Indexing
I don't remember which one at this point but there was a programming manual with an index that contained the following entries:
Endless loop - See loop, endless
Loop, endless - See endless loop
Add that to ->garbage in ~= garbage out ->
Ahh, fractal insanity.
DON'T, whatever you do, let that manual into your shack.
-