What was seen as a major hole in Google's MapReduce database technology has been plugged, not once but twice. In the same week. Californian start-up Aster Data and its more established rival Greenplum have both launched SQL integration for MapReduce. The lack of SQL tools was one of the main criticisms levelled at MapReduce in …
Anyone looked at ORACLE cluster or DB2 Sysplex recently.
They obviously use some variation of MapReduce internally to spread the laod over several machines, its just not exposed in hte API.
And as I said at the time Google's BigTable does support an SQL varient, among several options for accessing the database.
Great, now miss out the SQL
That's great, now you can make it faster by not putting the data in the SQL database in the first place...saving the query optimization to get it out again and shoving it into your mapreduce directly.
As Jeremy recently discovered:
Remember folks, SQL databases are for data that can be
1. Run against a query that can be expressed in SQL syntax
2. That query can be improved by one or more indexes
3. The burden of making the index is less than the time it saves in your queries.
If it can't be expressed in SQL then the query can't be reduced and you end up reducing the data instead..... which is the point of map-reduce.
Test: Given a changing set of points in 3d, run a query against a snapshot of that data that returns the nearest point to (x,y,x)... feel free to express that in sql....
select min(id) keep (dense rank first order by ((x-p_x)*(x-p_x) + (y-p_y)*(y-p_y) + (z-p_z)*(z-p_z)) ) from points
Google has made their own query language, called Sawzall, on top of MapReduce. It doesn't look much like SQL (it looks more like a "normal" programming language) , but it seems quite nice.
"select min(id) keep (dense rank first order by ((x-p_x)*(x-p_x) + (y-p_y)*(y-p_y) + (z-p_z)*(z-p_z)) ) from points"
i.e. brute force, (x,y,z) is what you query this db against, so keeping the index doesn't help you because x,y,z changes and would be recalculated each time.
Map reduce. No point in putting it in the db either, just bung the raw data into a map reduce.