16 posts • joined Thursday 7th April 2011 18:47 GMT
The shape of the pie
This would be a good article if it had included at least a little arithmetic following mention of the fact that Oracle Database licenses need to be capitalized in this "solution.". The recipe Oracle would have in mind for you includes Enterprise Edition, Real Application Clusters, Partitioning and In-Memory Database Cache thus the cost per core is US$105,000 *.5 == $52,500 (.5 is the "core factor"). Indeed, I threw in In-Memory Database Cache option because Oracle markets the current Exadata Database Machine as the "Exadata X3 In-Memory Database Machine."
One only needs to activate 32 of the 128 compute-grid processor cores in the X3-2 model to equal the hardware cost of an entire rack of Exadata (which is US$168,000,000).
Re: "Ramp-up" or selling
Of course I've read it. It's cookie cutter.
The problem I have is the the fact that **specific** numbers of units were cited in the Q2 call. The projection was 300 for Q3. Those words came out of Mr. Ellison's mouth on Dec 20 just short of 1/3rd into the quarter (Q3). Let's forget all that hubris and bravado back at OOW 2011 (Oct 2011) where Mr. Ellison said 3,000 this year. We all know that isn't happening.
Don't get me wrong. This isn't an ENRON feeling, but since the stock has dropped about 8% in the last 48 hours and there is all this rosy speak of Engineered systems to hype the stock, doesn't it seem reasonable to harken back to those ***specific*** numbers cited? Just the 300. I expected to hear about the 300 for Q3. Instead we get some bizarre rant about the good folks at Workday which, of course, doesn't need any prophylactic disclaimer because nobody in their right mind cares about those sorts of rants anyway.
So, in short, this get out of jail free based on the standard forward-looking disclaimer is a ruse.
"Ramp-up" or selling
"Exalytics in-memory database appliance was the fastest-selling product "
Actually Mr. Hurd said "the fastest ramp of [sic] any engineered system that we've released." Not to be a nit, but there is a huge difference between "fastest-selling" and an accelerated ramp up. From 1 to 2 in 2 days is 100% in 24 hours. That's an awfully fast ramp-up too.
We all have short memories.
In the Q2 call there were are ***specific*** unit counts specified by Mr. Ellison. He said 200 units in Q2, 300 in Q3 and 400 in Q4. Offering some nebulous percentage gain from an unspecified baseline is not the same as asking, "Um, Msrs. Ellison and Hurd, did you take book revenue and ship on 300 Exa[data|logic] units--to customers--in Q3?"
Remember Oct 2011: http://youtu.be/jAmgVbuFZwY
"80 cores is a lot more than the typical Exadata Database Machine. A full X2-2 rack comes with eight 2-socket database servers. Significantly less cores."
...a full rack X2-2 model has a total of 96 Xeon 5600 cores for the RAC grid (and a lot more in the storage rid). The only Exadata model that comes with E7 CPUs (to stay on the NEC comparison) is the X2-8 model which only comes with 160 cores. So this quoted statement is wrong.
ESG Paper did not use a database. Synthetic workload.
Actually that ESG paper being referred to specifically states their testing was performed with the FIO tool not an actual Oracle database. That matters. I reitterate: SLOB. http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/
Oracle IOPS workload choices matter
Would be interesting to see what they can get with SLOB - The Silly Little Oracle Benchmark:
Oracle was actually quite prescient in *not* publishing that HP Proliant DL980+Violin result back in 2010.
It turned out, shortly after the decision to hold back on publishing, IBM produced a result of close to 75% the TpmC achieved by the 8-socket DL980. IBM, however, required only half the number of processor sockets and their (awesome) MAX5 kit:
8 socket Xeon (Nehalem EX and E7 alike) scalability proven by TPC-C has become sort of a holy grail. Likewise, the TPC-H spread between 4 and 8 sockets (E7) is troubling as well. I have a blog entry teed up on that matter.
Exadata: 2 Grids, 2 sets of roles.
>The Exadata storage nodes compress database files using a hybrid columnar algorithm so they take up less space and can be searched more quickly. They also run a chunk of the Oracle 11g code, pre-processing SQL queries on this compressed data before passing it off to the full-on 11g database nodes.
Exadata cells do not compress data. Data compression is done at load time (in the direct path) and compression (all varieties not just HCC) is code executed only on the RAC grid CPUS. Exadata users get no CPU help from the 168 cores in the storage grid when it comes to compressing data.
Exadata cells can, however, decompress HCC data (but not the other types of compressed data). I wrote "can" because cells monitor how busy they are and are constantly notified by the RAC servers about their respective CPU utilization. Since decompressing HCC data is murderously CPU-intensive the cells easily go processor-bound. At that time cells switch to "pass-through" mode shipping up to 40% of the HCC blocks to the RAC grid in compressed form. Unfortunately there are more CPUs in the storage grid than the RAC grid. There is a lot of writing on this matter on my blog and in the Expert Oracle Exadata book (Apress).
Also, while there are indeed 40GB DDR Infiniband paths to/from the RAC grid and the storage grid, there is only 3.2GB/s usable bandwidth for application payload between these grids. Therefore, the aggregate maximum data flow between the RAC grid and the cells is 25.6GB/s (3.2x8). There are 8 IB HCAs in either X2 model as well so the figure sticks for both. In the HP Oracle Database Mahine days that figure was 12.8GB/s.
With a maximum of 25.6 GB/s for application payload (Oracle's iDB protocol as it is called) one has to quickly do the math to see the mandatory data reduction rate in storage. That is, if only 25.6 GB/s fits through the network between these two grids yet a full rack can scan combined HDD+FLASH at 75 GB/s then you have to write SQL that throws away at least 66% of the data that comes off disk. Now, I'll be the first to point out that 66% payload reduction from cells is common. Indeed, the cells filter (WHERE predicate) and project columns (only the cited and join columns need shipped). However, compression changes all of that.
If scanning HCC data on a full rack Exadata configuration, and that data is compressed at the commonly cited compression ratio of 10:1 then the "effective" scan rate is 750GB/s. Now use the same predicates and cite the same columns and you'll get 66% reduced payload--or 255GB/s that needs to flow over iDB. That's about 10x over-subscription of the available 25.6 GB/s iDB bandwidth. When this occurs, I/O is throttled. That is, if the filtered/projected data produced by the cells is greater than 25.6GB/s then I/O wanes. Don't expect 10x query speedup because the product only has to perform 10% the I/O it would in the non-compressed case (given a HCC compression ratio of 10:1).
That is how the product works. So long as your service levels are met, fine. Just don't expect to see 75GB/s of HCC storage throughput with complex queries because this asymmetrical MPP architecture (Exadata) cannot scale that way (for more info see: http://bit.ly/tFauDA )
The big announcement
After one works out how to do high-bandwidth physical I/O (Exadata) the next steps are to increase the methods available to avoid physical I/O. Exadata has some I/O elimination capabilities in the Storage Index feature which is code that executed entirely in the storage servers. So, having said that, the next way to avoid physical I/O is to have massive cache. But where? The X2-8 model finally supports 2TB RAM now that it is based on the E7 Xeon and supports larger DIMMS. So where would it be?
Your guess is as good as mine, but I'll go ahead: The big news will likely be servers connected to the infiniband fabric but *not* a part of the Real Application Clusters cluster. They will serve the sole purpose of caching data.
That's just a guess.
Glue or no Glue
What @Allison Park is referring to is the lack of sophisticated elecronics that properly extend from 4 sockects to 8 with Nehalem EX and Xeon E7 (Formerly Westmere EX). The Sun x4800 (aka G5, X2-8) is a glueless approach to extending beyond the direct-touch 4 sockets (3 QPI inks) to 8 sockets. In the IBM case the electronics are called EXA chipset (vestiges of Sequent). The value add (heavy lifting) done by IBM for this architecture should not be dismissed by anyone who lacks deep understanding. I have deep understanding and hands-on engineering experince with the Sun x4800 from my Exadata engineering past. I know the throughput and scalability challenges that server suffers. IBM's achievements in this space are astounding. Consider, for instance, it is actually less CPU stall time to locate and fetch a cacheline from the EXA chipset than local memory in the Nehalem EX offering. That doesn't actually surprise me. In spite of the fact that I've seen unknowlegable people dismiss these "glue" offerings elsewhere on the web, I have seen and experienced the benefit. For example, one of the first "glue" systems to integrate with Intel x86 chips was the Sequent IQ-Link. The card connected to the Intel P6 (Orion) bus which the Pentium Pro MCM was attached to. The Sequent "glue" processor in that case was 510-pin Gallium Arsenide. It was not until 3 years later that even Intel had a 500+ pin processor. That processor was able to return a line to the Pentium Pro CPU faster than was possible from local memory via Intel's own memory controller.
Sorry for the "memory lane" but glue matters. On the other hand, the jury is out on whether T Series processors matter.
By my assessment the HPDBS (DL980 + Violin solution) is likely not positioned as an Exadata killer for bandwidth-sensitive DW/BI workloads. It simply doesn't have enough high-bandwidth storage plumbing. On the other hand, a single-rack Exadata only supports a scalable read:write ratio of 40:1 (their data sheet 1,000,000 RIOP : 50,000 WIOPS). Actually, that 50,000 WIOPS is a gross number accounting neither for redundant writes (ASM redundancy) nor the larger sequential writes that a transaction processing system also must concurrently sustain. In other words, mileage varies (downward trend).
- World's OLDEST human DNA found in leg bone – but that's not the only boning going on...
- Lightning strikes USB bosses: Next-gen jacks will be REVERSIBLE
- OHM MY GOD! Move over graphene, here comes '100% PERFECT' stanene
- Pics Brit inventors' GRAVITY POWERED LIGHT ships out after just 1 year
- Beijing leans on Microsoft to maintain Windows XP support