Time to rethink machine learning: The big data gobble is OFF the menu • The Register Forums

Wednesday 5th July 2017 09:12 GMT Ian Bush

I mentally replaced ML with HPC throughout and the articles was still close to coherent and relevant. OMO the ultimate issue here is valuing programmers (both in industry and academia) with relevant specialist skills and rewarding them - that would encourage others to follow down their path.

5 0 Reply

Wednesday 5th July 2017 14:40 GMT Nolveys

valuing programmers...and rewarding them

I understand the words individually but together they make no sense. It works if you change "programmers" to "sales people". Maybe it's like last year when we gave the programmers coupons for 25% off of cans of bondo for their late 90's vehicles in lieu of Christmas bonuses?

Of course we did the bondo thing because the sales people were complaining about all the ugly vehicles with holes in them. We later came up with a better solution to the problem by renting a dirt lot on the other side of the highway for the programmers to park in. I think that exercise is important and having the programmers cross the dirt lot, a six-lane highway, our parking lot and our very impressive lobby (we got marble floors in there last year, by the way) improves output. We have had a few programmers hit by cars, but they probably weren't team players.

Now when clients come to our offices they only see brand-new BMWs and Audis parked in the front lot. Projecting a successful image is the most important thing, after all.

Well, I've got to go. I'm implementing a new program whereby we can monitor how much off-hours time programmers spend sleeping in their cubicles so we can charge rent.

- Dylan "Booya" Skilling, CIO

8 0 Reply
1. Wednesday 5th July 2017 16:57 GMT Mephistro
  
  @ Nolveys
  
  RFLMAO++
  
  1 0 Reply
Wednesday 5th July 2017 16:28 GMT thx1138v2

What programmers?

Why not replace the programmers with AI/ML? Then you can have all you want with no need to "value" them. I admit that might take a bit of marketing but that's the name of the game really - not computing. [/sarc]

4 0 Reply

Wednesday 5th July 2017 09:30 GMT Michael H.F. Wilkinson

It is not the amounts of data that matter, it is the labelling

Copious amounts of data are easy to get, rather harder to turn into information. In order to train most AI or ML systems you need copious data with a reliable ground truth. The latter is very, very hard to come by, and requires lots of very, very careful, and usually dull work in labelling data items as belonging to different classes. If your ground truth on which you train you method is suspect, you will end up with over-fitting problems, because the ML/AI method with faithfully try to reproduce erroneous human decisions. For deep learning methods like convolutional neural networks (CNNs) to yield their (often impressive) results, you need hundreds of thousands, or preferably millions of accurately labelled data items. CNNs have been around fr quite a long time, but only the advent of large, labelled databases of images and the like made the methodology take off. labelling hundreds of thousands of data items automatically would be ideal, but isn't always possible. Usually some poor sods has to do lots and lots of unglamorous work.

Apart from these problems (which are daunting enough), there is the problem of all the parameter choices (learning rates, numbers and type of layers in deep networks, etc) to get right

13 0 Reply

Wednesday 5th July 2017 09:45 GMT Andrew Commons

Re: It is not the amounts of data that matter, it is the labelling

Exactly. It is very easy to get many GBs of security logs but labelling them is a huge issue. So relatively ancient KDD Cup data is used over and over because it's among the few instances of tagged data available.

0 0 Reply
Wednesday 5th July 2017 11:28 GMT Charlie Clark

Re: It is not the amounts of data that matter, it is the labelling

One upvote isn't enough!

Apart from that ML is going to be pervasive but just not in the way that suits vendors and crystal ball readers like Gartner.

5 0 Reply
1. Wednesday 5th July 2017 11:51 GMT Anonymous Coward
  
  Re: It is not the amounts of data that matter, it is the labelling
  
  It's not just labelling, its accuracy and reliability.
  
  Most data sets are inaccurate - sensors will not always return results matching a similar sensor next to it. People dont always bother to say they have moved, married, lost an arm (great for bio-metrics) etc.
  
  Assuming asking questions of the data will get a reasonable result is the first hurdle not mentioned to the execs getting sold "solutions"
  
  if you can ever get as far as trusting the data, can you trust the questions? Organisations are highly likely to start customers prefer x over y just because of differences in behaviour or measures collected.
  
  The results become self fulfilling, or entirely random.
  
  Engineers have been working this out for ages in the science of climate for example to normalise data - its hard, never mind attempting predictions of it.
  
  As far as I am concerned most of the "solutions" are snake oil and should be treated as a big data dump.
  
  Its hard enough just getting sense out of my own systems management data which I have a broad familiarity with - its perfectly normal to advise root causes that are just another symptom - but at least I have some chance of recognising that...
  
  5 0 Reply

Wednesday 5th July 2017 09:34 GMT Detective Emil

I get this feeling of déjà vu …

Google Ngrams shows that Artificial Intelligence enjoyed a vogue in the late eighties, neural networks in the mid nineties. The graphs don't show mentions of Machine Learning rising much, because Google hasn't added to the Big data behind them since 2008. But, if it does, and if I run this query again in 2010, I'm sure the pattern will repeat.

5 0 Reply

Wednesday 5th July 2017 16:31 GMT thx1138v2

Re: I get this feeling of déjà vu …

Of course if you can run it again in 2010 with your time machine you'll have no need for the results assuming your time machine also works properly in forward mode.

0 0 Reply

Wednesday 5th July 2017 10:08 GMT Destroy All Monsters

Suddenly sanityland

Unfortunately, we can't stop here.

2 0 Reply

Wednesday 5th July 2017 10:14 GMT Anonymous Coward

Asay is back... with some new buzzwords to sell you.

But I prefer to buy from Steve Bong, he's much more reliable.

3 0 Reply

Wednesday 5th July 2017 10:29 GMT Doctor Syntax

Objectives?

One thing these articles seems to lack is exactly what all this unicorn dust is going to do for businesses that can't be done better and cheaper by other approaches.

I have this vision that, after spending multimillion currency units on analysing multi-petabytes of data, some data scientist rushes into marketing to announce "We can sell more ice-cream in hot weather.".

7 0 Reply

Wednesday 5th July 2017 11:45 GMT Anonymous Coward

Re: Objectives?

As the article points out:

>Beyer agrees, acknowledging his own "dirty secret". "So many [so-called ML] problems could be solved by just applying simple regression analysis."

Which is especially telling, considering some ML algorithms are just complex regression analysis.

7 0 Reply
Wednesday 5th July 2017 16:33 GMT thx1138v2

Re: Objectives?

More ice cream in hot weather is ALWAYS a winner. I'll sign up for that.

3 0 Reply
Wednesday 5th July 2017 17:07 GMT Mephistro

Re: Objectives?

"We can sell more ice-cream in hot weather."

"... and following advice from our ML systems, we're going to double production and send half of our ice creams to the Atacama desert."

2 0 Reply

Wednesday 5th July 2017 11:12 GMT ntevanza

What's your data maturity level?

Do you know what data you have?

Can you point to where your data are and who's reponsible for them?

Do you retire data?

Do you know what data you don't have?

Are your data available, reliable and compliant for the people who need it, assuming you know who they are?

Does your org understand the difference between privacy and confidentiality?

Does your management know what it means to pose an empirical question?

Is your management reasonably free from all of the following biases: survivor, confirmation, belief, recentness, egotism?

Does your org understand the difference between a marketing trend and an empirical finding?

Is your org capable of changing its mind in the face of emprical evidence?

Does your org tolerate short term failure and uncertainty?

If you answer no to any of these questions, it's best to learn to walk first.

9 1 Reply

Wednesday 5th July 2017 16:35 GMT thx1138v2

Re: What's your data maturity level?

AI/ML can answer all of those questions, silly. I'm too busy selecting my lottery numbers at the moment... Hey, how about...

1 0 Reply

Wednesday 5th July 2017 12:25 GMT Anonymous Coward

@Matt Assay

Dude,

You clearly don't know anything about Big Data.

Its not hard. Its pretty simple and straight forward.

The issue though is that many who claim to know 'big data' really don't and were never taught how to think in terms of big data and data processing. Its a lack of education.

ML and AI are more complex, yet the same problem continues because people who are jumping on the bandwagon are doing so because they are following the money and hype.

I get paid to fix problems that others create.

Concepts that were addressed over five years ago are still not well known today.

The problem isn't how hard, but that those who are asked to do the work, don't have the requisite education and training.

Too much hype in the market and everyone is trying to jump on the next big thing and get in early while they still don't know how to do the work needed to get done today.

Posted Anon, for the obvious reasons.

3 1 Reply

Wednesday 5th July 2017 14:54 GMT Scott Broukell

My thoughts are pretty much the same as many of the above. In other words one should very much heed the old axiom Garbage In = Garbage Out. Call me a cynic, but I feel the push of hardware vendors to sell ever faster, more expensive, boxen on which to run the new shiny shiny, must-have, ML / AI. An increase in the power consumption doesn't automatically get the promised results.

Sadly, there will come a time when that old axiom is long forgotten and no matter what output the machine gives it will be taken as shiny shiny silicon truth and as such it will trounce the individual human bean and rise to supreme dominance. <rant over = goes back to building off-grid shack in the woods>

2 0 Reply

Wednesday 5th July 2017 16:42 GMT thx1138v2

Indexing

I don't remember which one at this point but there was a programming manual with an index that contained the following entries:

Endless loop - See loop, endless

Loop, endless - See endless loop

Add that to ->garbage in ~= garbage out ->

Ahh, fractal insanity.

DON'T, whatever you do, let that manual into your shack.

1 0 Reply

Wednesday 5th July 2017 16:49 GMT Anonymous Coward

These data-driven product people: They frequently have $ENORMOLISTOSKILLS

hmm, "jack of all trades, master of none", does that apply here? Sounds like they should be looking for /teams/, not unicorns.

0 0 Reply

Wednesday 5th July 2017 22:48 GMT Anonymous Coward

Cute robot!

Where can I get one?

https://regmedia.co.uk/2017/07/04/tiny_robot_photo_via_shutterstock.jpg

1 0 Reply

Friday 7th July 2017 05:30 GMT Anonymous Coward

Re: Cute robot!

It kinda looks like the ones from "Batteries not included".

0 0 Reply

Thursday 6th July 2017 13:26 GMT FrankH99

PC Nonsense

"...the slightest bit of real research reveals that ML is very hard and not something the average Python engineer is going to spin up in her spare time."

Now I'm no expert, but I suspect that the average Python engineer would struggle to spin it up in HIS spare time.

0 0 Reply

Topics

Special Features

Vendor Voice

Resources

COMMENTS

@ Nolveys

What programmers?

It is not the amounts of data that matter, it is the labelling

Re: It is not the amounts of data that matter, it is the labelling

Re: It is not the amounts of data that matter, it is the labelling

Re: It is not the amounts of data that matter, it is the labelling

I get this feeling of déjà vu …

Re: I get this feeling of déjà vu …

Suddenly sanityland

Asay is back... with some new buzzwords to sell you.

Objectives?

Re: Objectives?

Re: Objectives?

Re: Objectives?

What's your data maturity level?

Re: What's your data maturity level?

@Matt Assay

Indexing

These data-driven product people: They frequently have $ENORMOLISTOSKILLS

Cute robot!

Re: Cute robot!

PC Nonsense

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

Arm flexes silicon muscles to push generative AI at the edge

Developers are calling the shots on AI planning, judging by your experience

Stability AI decimates staff just weeks after CEO's exit

Why making pretend people with AGI is a waste of energy

Belgian beer study acquires taste for machine learning

CNCF boss talks 'irrational exuberance' in an AI-heavy Kubecon keynote

New York Times: OpenAI’s claim we 'hacked' its products both 'irrelevant' and 'false'

Nvidia rival Cerebras says it's revived Moore's Law with third-gen waferscale chips

Can AI shorten PC replacement cycles? Dell seems to think so

Meta seeks ASIC designers for ML accelerators and datacenter SoCs

Quilter's AI design service nabs $10M to make circuit board design easier

What is Model Collapse and how to avoid it

About Us

Our Websites

Your Privacy