Let's get this straight. Red Hat should package up its own commercial Hadoop distribution or buy one of the three key Hadoop disties before they get too expensive. But don't hold your breath, because Red Hat tells El Reg that neither option is the current plan. Red Hat is going to partner with Hadoop distributors and hope they …
"The genius of Hadoop [...] is that it moves compute jobs to the storage for execution rather than trying to move data sets off disk arrays to compute nodes. You send little routines out to big blocks of data, which is a hell of a lot faster"
The same blazingly obvious stroke of genius that sql was designed to do since the mid '70's, and likely other DBs before that.
Why do I read stuff by this author. Surely the minimum a journalist should be accomplished in, beyond being actually literate, is to have a working knowledge of the subject so as not to swallow then blow back bullshit press releases. To have actually got ones hands dirty with some, you know, actual database work, perhaps even (gasp!) is the realm of so-called unstructured/big data, in addition to any prior reading on the subject, would be Most Excellent.
This sucks, it really does.
The reason Hadoop is fast is that it omits many required features of a Enterprise RDBMS.
This isn't genius, its actually idiot trying to push a buzzword.
Search Engine then great, Enterprise application such as HR then lunatic.
I am beginning to think its not idiots pushing a buzzword but penniless hustlers pushing developers down the wrong route. When the application doesn't work then they will push the buzzword - You should have gone with Oracle. Its the same idiots pushing MariaDB, which doesn’t have partitioning and CUBRID which doesn’t have indexing.
Sure DBMS such as DB2 Purscale and SQL Server scale by just adding more CPU's, but sharding is a low cost technique to achieve unbelievable transaction rates and number of concurrent users. DB2 Purscale and SQL Server are better suited for customers who don’t want to alter an existing application. Sharding requires an application to be modified or written from scratch to take advantage of the technique. Sharding can also be done with DB2 Purscale and SQL Server at the application layer but why do you want to pay for something when you can have it for free.
The proponents of Hadoop need to go back to university and learn about databases.
With SQL you say what you want and the DBMS figures out how to do it.
With Hadoop you spend a lot of time specifying how to do what you want.
A little change in the DB and you could end up with massive rewrites.
Sure Hadoop is fast and so is MyISAM due to omitting required features of a DBMS.
Data loss and data corruption is something that will worry most.
With decent DBMS you can get a transaction safe backup whilst online.
Create a massivley parallel partitioned cluster of DB nodes and you have the speed you need.
PostgreSQL allows you to Shard data and create a massively parallel server for free.
Its this fact that the established players find hard to swallow.