Hadoop 2 stampedes onto world's mega compute clusters

The Apache Software Foundation has branded the data analytics Hadoop platform with version 2 and sent the Elephant-logoed system stampeding out into the wild. The second version of the open-source technology comes with a refreshed compute engine via the YARN data processing and service engine, and the addition of high- …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Thursday 17th October 2013 12:00 GMT Steve Loughran

sort of

You could look at a big chunk of the grid schedulers: condor, platform, mesos and say "quelle difference?", but there are some

* designed to place work close to the data: your code can ask for specific machines & racks, with the scheduler trying to place it there, but if you say "best effort" then it will do it as close as it can network wise. This lets us run Hadoop without the high-cost SAN networks and so make storing petabytes of data affordable.

* designed for algorithms that have to handle failure. MapReduce does this by splitting up the work, retrying failed jobs, recognising slow machines and re-issuing the work -and even blacklisting the slow boxes. Those slow ones are the enemy as these stragglers slow everything down. Apache Tez can do checkpoints, then roll back to them. The Streaming algorithms need to replay the streams, which is a different problem.

If you do go back to the 1980s era massively parallel designs, some of the architectures do look familiar. Is the scale that's different -a scale that makes failures a fact of life that everything has to handle, rather than a disaster that needs someone to be paged and your on-site HDD replacements (for which you pay a lot for) wheel out. Even so -there are lessons there that we should learn from. After all, aren't VMs and their hypervisors just descendents of VM/360 -which had billing in from the outset too.

0 0

Topics

Special Features

Vendor Voice

Resources

User topics

Article topics

User topics

Article topics

COMMENTS

Weren't we already here long ago?

sort of

Google success story

About Us

Our Websites

Your Privacy