Nooooooohhh!!!!!!1!!!ELEVEN
Anyone but Oracle!
Three letters: ZFS
Hadoop, HBASE, and Cloudera all seem really promising - the last thing any of them need is Oracle coming in "to help"
Oracle isn't the biggest enterprise software vendor, but in 2010 it grew faster than its big-enterprise peers, including Microsoft and IBM, to claim third place. Being ever so ambitious, it's unlikely that Oracle chief executive Larry Ellison will be content to take the bronze. But it's equally unlikely that relational databases …
The Hadoop sub-project HBase, is where the real concern should be. It is a column oriented database, that just uses Hadoop for file storage. HBase directly competes with Exadata.
Hadoop is just a data storage engine. It can do Map-Reduce, but the primary originator of Map-Reduce, Google, has begun to move beyond it already. Google Instant no longer uses Map-Reduce.
...Oracle v12 will be called 12c. :-)
8i and 9i = internet
10g and 11g = grid
12c .. = cloud computing
The Oracle database product is just more than a RDBMS. It is a full blown and very able and scalable data processing platform. Have been for many years now - due to assimilating technologies traditionally the domain of the application layer.
Today you can put the entire app layer into Oracle - and it will often perform better and scale faster than using a separate J2EE or .Net app layer... and cheaper. With reduced moving parts that makes everything simpler.
But 11g lacks ito true cloud and distributed computing. Supporting Hadoop and similar technologies... That would be of a major benefit to most/all Oracle customers.
Sure, you may not like Larry and his suits and their approach to Open Source, pricing, marketing and what not.
But that aside, technically Oracle is a damn fine and sexy data processing platform and superior to a lot of other products. It makes evolutionary sense for it to consume something like Hadoop.
I don't think the author of this article should be allowed to write about Apache Hadoop -it's painful to read. I hope nobody actually believes a word this person says.
1. The only official release of Apache Hadoop comes from the Apache Software Foundation, the last of the 0.20 releases, 0.20.203 came out yesterday with lots of bug fixes from Yahoo! and Cloudera in it.
2. Any other so called "distribution" of Hadoop is not "a distribution" unless it is just the Apache release packaged for easy installation (as Thomas Koch does for debian) -it is a derivative work, containing code that is not in the Apache release.
3. Such derivative works can be open source (Cloudera) or closed source (EMC, IBM).
4. Any closed source derivative work forces the distributor to maintain their branch indefinitely.
5. Any derivative work forces the developer to test at the same scale as Y! and Facebook (thousands of machines, tens of PB of storage), or they cannot claim that it scales up.
6. Any closed source derivative work will only support bug fixes and patches at a rate determined by the closed source developer team, and provided at a cost determined by the price of that developer team.
7. Apache only provide support for the official apache release. If you use Cloudera or EMC: go talk to them about problems.
8. People who are not part of the Apache developer and user community do not get their needs addressed in the Apache releases, because we are unaware of them.
9. We, the apache developer team, have no need to take on random patches from developers of closed source derivative works unless we can see tangible benefits.
10. Finally, any derivative work that pulls out large amounts of the Hadoop codebase (e.g Brisk, EMC Enteprise HD) cannot call themselves a version of Hadoop. They are not. We, the apache community define the interfaces and what "100% compatible" means. When someone like EMC declare their derivative work is "certified 100% compatible", that is a meaningless statement. Only the official Apache Hadoop release is, implicitly 100% compatible with Apache Hadoop.
11. We reserve the right to change the semantics and interfaces to meet the community needs, on the schedule that suits the development community.
12. The rules of using the term "Hadoop" are defined in the Apache license, and it is not legal to say "a distribution of Hadoop" if it is in fact a derivative work. This is why Cloudera call their software "Cloudera’s Distribution including Apache Hadoop". EMC, Brisk and others are sailing close to the wind here.
13. The fact that Oracle are now subpoenaing Apache in the Oracle/Google lawsuit mean that the relationship between Oracle and Apache have reached a low point -even after Apache left the Java Community Program due to Oracle's unwillingness to meet its legal requirements to provide the Testing Compatibility Kit without imposing Field of Use restrictions.
14. Because of (#13), it's hard to see a team of Oracle developers being trusted or welcome in the Hadoop community. You can't serve subpoenas on the ASF and then say "we'd like to help develop a technology of yours that threatens our entire business model and margins". They won't be trusted.
I have a term for the EMC-style not-quite-Hadoop products that use the same interfaces but offer unknown semantics and a cost model on a par with the vendor's existing enterprise product line. It is "Enterprisey Hadoop". This is not Apache Hadoop supported in the Enterprise, it is some derivative work that pretends to be Hadoop but misses the point about affordable scalability through commodity hardware and an open source codebase.
SteveL, Apache Hadoop Committer. All comments are personal opinions only, etc.
Your point 13.
If Oracle is indeed not (quote) "meet(ing) its legal requirements" (unquote) ito of providing ASF with a TCK license, then ASF has a legal recourse and should be talking via lawyers in court to the Oracle suits?
As this is not the case (did I miss something in the press about ASF suing Oracle over TCK licensing?), you are in fact blowing smoke up orifices... just as the Oracle suits are...
The ASF doesn't have the money for a lawsuit:
https://blogs.apache.org/foundation/entry/statement_by_the_asf_board1
"Through the JSPA, the agreement under which both Oracle and the ASF participate in the JCP, the ASF has been entitled to a license for the test kit for Java SE (the "TCK") that will allow the ASF to test and distribute a release of the Apache Harmony project under the Apache License. Oracle is violating their contractual obligation as set forth under the rules of the JCP by only offering a TCK license that imposes additional terms and conditions that are not compatible with open source or Free software licenses"
Oh yeah... I forgot that in today's all powerful and important blogosphere, claims like that in https://blogs.apache.org/foundation/entry/statement_by_the_asf_board1, can be taken as what the legal judgement would be of a court of law, made by one or more (highly qualified) judges.
Sheez... but there's a lot of smoke in here...