The Hadoop project at the Apache Software Foundation is beating its chest for delivering the v1.0 version of the open source MapReduce data analysis tool, its Hadoop Distributed File System (HDFS), and other related code. While software version and release numbers can sometimes be arbitrary, they are often also symbolic, and …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Wednesday 4th January 2012 20:34 GMT Dave 124

Another Triumph for Computing in SLOOW Motion( or Why the choice of a toy elephant is appropriate)

Running at 1/30th of equivalent C/C++ implementations Hadoop creates a new standard for doing it slow across large numbers of machines. Don't plan on using this elephant right away. From experience all of the how-to's and interfaces are now out of date and you will have to wait for a new set of e-books to buy. Once up it will take daily care and feeding to keep it from crashing. I for one would like to see them fix their installs so they actually work out of the box rather than working to bump release numbers....

1 0
1. Wednesday 4th January 2012 21:18 GMT Justin 3
  
  Really? Having used this in production with a customer for the last 12 months, the deployment was pretty trivial, the pains came from other tools we were using. Deployment/config is pretty straightforward compared past experiences I've had with things like WebLogic and WebSphere. Using a decent packaged distro does take the pain out of it, we used Cloudera's CDH3. As for performance burning through several TB of log data can be done in minutes on a small cluser i.e. less then £35,000 worth of kit. I guess your mileage may vary...
  
  0 1
  1. Wednesday 4th January 2012 23:50 GMT multipharious
    
    Trivial is about right
    
    Setting up a Hadoop + Nutch and setting them free to crawl Enterprise targets is more or less easy...if you can get past the old Nutch documentation...much improved as of last year.
    
    Happy to see Hadoop hitting 1.0!
    
    0 0
  2. Thursday 5th January 2012 21:18 GMT Dave 124
    
    Apparently You Missed the Point...
    
    The point is that the appearance of speed Hadoop relies on a huge hardware investment because of the choice of implementation language. Current commercial implementations using C++ are hitting a 30x speed improvement over the Java implementation of Hadoop and maintaining compatibility with the original API. Yes, when fully deployed both systems will give answers - Hadoop in an extended coffee break and the C++ versions in seconds. Apache continuing to push the Java based product will eventually be the death of the project as the commercial products burn past them.
    
    0 0
Wednesday 4th January 2012 20:34 GMT Justin 3

Cassandra and Hadoop aren't related

Cassandra is independent of Hadoop its not an add on of any sort. The only similarities are that Cassandra and HBase are column family based table stores.

0 0
1. Thursday 5th January 2012 09:28 GMT foxyshadis
  
  You misread it
  
  Mahout is the add-on, Cassandra is an alternate data store. In case you're behind the times, yes, Cassandra can be used with Hadoop's MapReduce and other APIs for quite some time now, even though HBase is the most common data store.
  
  0 0
  1. Thursday 5th January 2012 16:47 GMT Justin 3
    
    I may have misread the mahout point, however Cassandra and Hadoop aren't interoperable, Hadoop can't use Cassandra to natively store files, similarly Cassandra doesn't use Hadoop to store or process its data natively. If the writer had looked down the Apache site you would see that Cassandra is a seperate project from Hadoop.
    
    0 0

This topic is closed for new posts.

Topics

Special Features

Vendor Voice

Resources

User topics

Article topics

User topics

Article topics

Apache lets fly Hadoop 1.0 data muncher

COMMENTS

Another Triumph for Computing in SLOOW Motion( or Why the choice of a toy elephant is appropriate)

Trivial is about right

Apparently You Missed the Point...

Cassandra and Hadoop aren't related

You misread it

Other stories you might like

Apache OFBiz zero-day pummeled by exploit attempts after disclosure

Four in five Apache Struts 2 downloads are for versions featuring critical flaw

Critical Apache ActiveMQ flaw under attack by 'clumsy' ransomware crims

Microsoft extends life support for aging Apache Cassandra 3.11 database

Mirai botnet loves exploiting your unpatched TP-Link routers, CISA warns

China outlines plan for National Integrated Government Affairs Big Data System

UK.gov finds billions in cash for big data contracts

Apache Superset: A story of insecure default keys, thousands of vulnerable systems, few paying attention

Airbus pulls up hard, no longer buying 29.9% stake in Atos-owned Evidian

Ex-BigQuery exec and Motherduck CEO: For some users, the answer is to think small

Native Americans urge Apache Software Foundation to ditch name

Apache Iceberg promises to change the economics of cloud-based data analytics

About Us

Our Websites

Your Privacy