The Hadoop project at the Apache Software Foundation is beating its chest for delivering the v1.0 version of the open source MapReduce data analysis tool, its Hadoop Distributed File System (HDFS), and other related code. While software version and release numbers can sometimes be arbitrary, they are often also symbolic, and in …
Another Triumph for Computing in SLOOW Motion( or Why the choice of a toy elephant is appropriate)
Running at 1/30th of equivalent C/C++ implementations Hadoop creates a new standard for doing it slow across large numbers of machines. Don't plan on using this elephant right away. From experience all of the how-to's and interfaces are now out of date and you will have to wait for a new set of e-books to buy. Once up it will take daily care and feeding to keep it from crashing. I for one would like to see them fix their installs so they actually work out of the box rather than working to bump release numbers....
Really? Having used this in production with a customer for the last 12 months, the deployment was pretty trivial, the pains came from other tools we were using. Deployment/config is pretty straightforward compared past experiences I've had with things like WebLogic and WebSphere. Using a decent packaged distro does take the pain out of it, we used Cloudera's CDH3. As for performance burning through several TB of log data can be done in minutes on a small cluser i.e. less then £35,000 worth of kit. I guess your mileage may vary...
Trivial is about right
Setting up a Hadoop + Nutch and setting them free to crawl Enterprise targets is more or less easy...if you can get past the old Nutch documentation...much improved as of last year.
Happy to see Hadoop hitting 1.0!
Apparently You Missed the Point...
The point is that the appearance of speed Hadoop relies on a huge hardware investment because of the choice of implementation language. Current commercial implementations using C++ are hitting a 30x speed improvement over the Java implementation of Hadoop and maintaining compatibility with the original API. Yes, when fully deployed both systems will give answers - Hadoop in an extended coffee break and the C++ versions in seconds. Apache continuing to push the Java based product will eventually be the death of the project as the commercial products burn past them.
Cassandra and Hadoop aren't related
Cassandra is independent of Hadoop its not an add on of any sort. The only similarities are that Cassandra and HBase are column family based table stores.
You misread it
Mahout is the add-on, Cassandra is an alternate data store. In case you're behind the times, yes, Cassandra can be used with Hadoop's MapReduce and other APIs for quite some time now, even though HBase is the most common data store.
I may have misread the mahout point, however Cassandra and Hadoop aren't interoperable, Hadoop can't use Cassandra to natively store files, similarly Cassandra doesn't use Hadoop to store or process its data natively. If the writer had looked down the Apache site you would see that Cassandra is a seperate project from Hadoop.
- World's OLDEST human DNA found in leg bone – but that's not the only boning going on...
- Lightning strikes USB bosses: Next-gen jacks will be REVERSIBLE
- Pics Brit inventors' GRAVITY POWERED LIGHT ships out after just 1 year
- Storagebod Oh no, RBS has gone titsup again... but is it JUST BAD LUCK?
- Three offers free US roaming, confirms stealth 4G rollout