back to article Former Yahoo! Hadoop honcho uncloaks from stealth

Yet another big data startup has uncloaked which traces its roots in the Hadoop MapReduce project started by, and open sourced by, Yahoo! At the Structure Data 2012 conference in New York on Wednesday, Todd Papaioannou, formerly vice president of cloud architecture at the internet media company and the head honcho through the …


This topic is closed for new posts.
Anonymous Coward

just THINK , they could do realtime firing of executives by a disembodied voice

after anal-yizing the stock price with hadoop and take HR completely out of the picture.

who needs people , when machines can do it ALL !

as el reg would say "RoTM"

( just thinking about Carol the great standup comedienne (she really did miss her vocation in life) who got fired by a pseudo robot .... )

Thumb Up

Did I miss something?

I would like to offer Mr. Morgan a sincere and happy THANK YOU for a non-annoying headline that doesn't have exclamation points after every word.

Silver badge

Hadoop Streaming

The concept of moving the process to the data goes away when you think in terms of real time.

( you don't write to disk and when you remove the disk from Hadoop, you're left with what he is calling a compute engine.)

This is patently obvious.

Just saying!

Gold badge

I hate this meme, really I do...

but...#firstworldproblems, much?

"Twitter takes to long to update!" ++spilt milk and screaming ADHD is what I get as a rational for the billions that will be pissed away on this.

While that (sadly) may be commercially viable enough of a reason to invest in "being able to process all the datas in a hadoop cluster in real time," I am really scratching my head trying to find a single useful application of the technology.

Hadoop batching and the like today gets us the ability to comb those structures somewhere around the 15 minute mark for largish datasets. So your news webpage would be 15 minutes behind. Egads!

15 minute lag time isn't going to be the end of the world for medical storage, geographical data storage, astronomical data, particle physics data, or anything else I can come up with. In fact, there are only two things I can think of that might be non-ADHD related that this innovation might enable.

Predicting earthquakes (highly unlikely, even with the proposed technological improvements outlines here,) and tracking everyone, everywhere, in real time. Not just online, but using image recognition, GPS, voice transmissions, etc. Sifting the mountains of “publicly available” information in real time to track a few billion people, find out what they buy, who they interact with, what they believe, how they vote and so forth.

Are we really proposing handing that technology over to our ever increasingly paranoid and unscrupulous governments in exchange for a quicker goddamned Twitter update?

What hath our obsession instant gratification wrought?



From what I've seen Storm seems to tackle this very problem rather well! It was developed by a bunch of people who were analysing data coming out of Twitter. It was open sourced shortly after Twitter acquired the company. For a rather young open source project it surprisingly seems to have rather good documentation!


REAL time

Real-time used to mean 'fast enough to affect the process creating the input'. For the process of sifting through our e-dustbins looking for Tesco bills I would think a couple of days is 'real' enough.

AndyD 8-)#

This topic is closed for new posts.


Biting the hand that feeds IT © 1998–2017