Right problem - wrong solution
I could not agree more with Mike Olson of Cloudera; though I do not agree with the conclusion. Mike says "..............................web logs that can't easily be digested using existing relational system." We agree that web logs can't be easily digested using a relational system. We are also glad Mike is prepared to mention the elephant in the room, though given who he works for that would seem fair, that Netezza is a relational database in a very fast box. I think of this as lawnmowers and duct tape. You strap enough lawnmowers together they will go pretty quick, won't handle very well, the safety record's spotty, they burn a lot of two-stroke and keep running out of gas, still they go fast in a straight line.
Again Mike is right when he says customers want to be able to merge this data with other data types such as customer and transaction data. The answer though is not to process some in the cloud and then some in an appliance. The right answer is to put them all in a common store that can process both. Don't get me wrong, Hadoop is a great technology. Cloudera is doing a great job and adds a lot of value. In the example Mike offers though I suggest you put all the data in a Column Oriented Database Management System CDBMS, not necessarily SAND’s but put the data into the right technology. If you take the example Mike described above, you have to push your data into the cloud, analyze some of it, pass some of it down into an appliance, analyze some more of it, and then start all over again. If it was that easy and we didn’t have to be concerned about network bandwidth, security, and performance, then we could look at whether we want our data segregated or combined in one place.