Chewing on big data using the MapReduce protocol, and the open source Hadoop stack that implements it, is all the rage these days. But there is more than one way to stuff an elephant. The Hadoop tool created by Yahoo! (and named after a stuffed elephant) is now managed by the Apache Software Foundation, and it is the tool of …
Huh? Doesn't IBM Marketing do their homework before they do their spin?
""In the current Hadoop distro, it is one job at a time," Hertzler tells El Reg. "You need to add distributed cluster logic to manage multiple MapReduce jobs at the same time on the same cluster." Or, use multiple Hadoop clusters, as Yahoo! does. "But Symphony is already a distributed workload manager and knows how to distribute data and work around a cluster.""
I guess IBM doesn't know about the Fair Scheduler.
The only feature Hadoop scheduling doesn't offer is the ability to suspend a job so that it can be completed at a later time.
Of course here's the problem... only a portion of IBM's 'Blue Stack' customers are going to be willing to go with IBM's idea of Grid computing. Too many companies have either committed to Hadoop or have committed to using Hadoop in part of a PoC. (Proof of Concept).
I had thought IBM learned their lesson from their earlier mistakes when getting in to this space.
Platform is not IBM Einstein
Perhaps the suggestion to do homework is best prescribed as "physician heal thyself."
Bravo, your anger and ego got the better of you on this post.
Sorry I read this as something IBM is behind because of the 'IBM's Big Table' inclusion....
Yes, I had a Homer Simpson moment and I'm big enough of a man to admit it.
However you can replace IBM w Symphony and my comment stands...
"The one thing that the Symphony MapReduce release will also have is a price tag that is significantly higher that the free and open source stack. Without the MapReduce functionality, Symphony 5 costs $250,000 for a 100-node cluster, and scales up to millions of dollars for licenses. ®"
Now I feel like a Stooge!
Y! run different clusters for scale
Whoever said that Y! run >1 cluster so they can submit multiple jobs is ill informed. Yahoo! have multiple clusters because the current scale of the HDFS filestore tops out at 25-30 PB, and putting Platform's code on top of that will not remove that limitation.
The MR engine can schedule multiple jobs, and can even prioritise work from different people. It too has a scale limit of about 4K servers, and even that requires tuned jobs to avoid overloading the central Job Tracker.
If you do want to know more about what Hadoop's limits are, you are welcome to get in touch with me, a committer on the Apache Hadoop project, as otherwise you will end up repeating marketing blurb from people who have a vested interest in discrediting a project that is tested at Y! and Facebook scale, is free, and which has shown up fundamental flaws in the "classic" Grid frameworks, namely their reliance on high cost SAN storage limits their storage capacity, and hence their ability to work with Big Data problems. It is good that the Platform people now have a story to work with lower cost storage than GPFS -by using Hadoop's on filesystem- but I'm not sure then why you need to pay the premium for Platform over the free version of Hadoop. That of course is the other flaw in the classic frameworks...