EMC blew NetApp and others away with a CIFS benchmark in January. Now NetApp has returned that favour with interest, blowing EMC away in an NFS benchmark. The two benchmarks are the CFS and NFS versions of the SPECsfs2008 file access benchmarks. What gives them special piquancy is that this is the first public performance match …
"... seems to be no inherent limitation in using a ... flash cache ..."
Not sure I agree with that statement.
For one thing, cache, by definition, is a staging area; stale data isn't expected to hang around very long; i.e., once a particular chunk of data isn't considered "useful" any more (by whatever criteria the array controller uses to make such a determination), it's supposed to be flushed to disk. In so doing, one can keep the size of the cache -- which is expensive -- small (on a relative basis, compared to overall persistent storage capacity).
So let's say the server(s) connected to the controller keep reading and updating a very large sequential data set. The data set could be anything, perhaps a chunk of petroleum geo-prospecting data. This data set is large enough, in fact, to use almost the whole cache. Since the servers crunching the data keep updating the set, the chunk never gets flushed to disk; it is constantly held in cache. Since almost the entire cache is occupied by a single data set, I/O for other data sets ends up being heavily constrained. This might not be a problem, if the array is dedicated to a server cluster oriented toward a single task.
However, in the HPC/supercomputing arena, one is often dealing with multiple large data sets being processed in parallel, and competing for processor time and storage resources. In combination, these data sets could dwarf the size of the cache. Your cache becomes a bottleneck, because it isn't large enough to handle the competing storage access demands.
Better to build the whole array out of Flash, and (if necessary) use on-controller RAM for cache, methinks, than to use a flash cache with mechanicals hanging off the tail-end. Much more expensive, sure, but probably a lot less of a performance bottleneck for large sequential storage requests.
Caching wins for general adoption
I actually expect very large sequential workloads to be great on NetApps PAM cards. The predictive algorithm should allow just the active parts to exist in the read cache buffer. Where EMCs SSDs come into their own is for highly random reads or for extremely high writes, both of which are a challenge for a predictive cache engine. Both have a place, but with cache sizes ever increasing my bet is on the flashcache / PAM card / readzilla model for general adoption.
Didn't the benchmarks prove otherwise??
"Your cache becomes a bottleneck, because it isn't large enough to handle the competing storage access demands.
Better to build the whole array out of Flash, and (if necessary) use on-controller RAM for cache, methinks, than to use a flash cache with mechanicals hanging off the tail-end. "
The FlashCache on the NetApp system was only a fraction of the dataset used in the benchmark (~4.5% of the dataset, ~1.2% of the exported capacity), yet it had 172% more IOPs, delivering them with ~half the latency (198% faster). The system also had 488% the exported capacity at a fraction of the cost of the all-SSD system...
As opposed to the 'large sequential' workload, where predictive cache algorithms can really shine, I see only one use case for SSDs:
Unpredictably random read workloads, where the whole dataset fits into the SSD-provided space. I know one such application, NetApps with SSD-shelfs were deployed after extensively testing multiple vendor's solutions.
I'd imagine the Celerra + Symmetrix VMAX combo would cost alot more dough than the FAS6240?
Long live VAX
I thought VAX computers died in the 90's.
".....HP leads the field with a 333,574 IOPS score achieved by a four-node BL860c cluster, using Itanium CPUs, not X86 ones....." With these willy-waving benchmarks, what's more interesting than the chips in the "gateway" cluster at the front-end is what storage is actually doing the hard work in the background. In the hp case, that was sixteen low-end MSAs with three add-on disk shelves each, packing a grand total of 1472 SAS disks for 25.7TB useable diskspace. I'm sure that setup was a means to provide a cheap way to have the maximum number of spindles, but whilst the hp score is very impressive, I'm not sure it's what you'd actually want in production. As the hp IOPs figure ramped up, so did the response time, peaking at 4.8s. The EMC Cellera had a peak of 4.6s. The NetApp score is slightly better, the response time hitting 3.6s at peak IOPs, but would you really want that kind of lag in production? Isn't it more likely you'd have your NFS/CIFS gateway in front of a generic, monolithic array?
EMC much more expensive than NetApp
Sent to me and posted anonymously:-
"You missed a critical metric in the comparison - cost per IOP. The EMC lab queen system is likely 10x more expensive than the NetApp kit. For it to be only slightly better in performance at that cost is laughable and you should point it out since your article implies they are similar cost configurations."
Thanks for this ... Chris.
Small correction - 3270 was not using Flash Cache :)
Hi, Dimitris from NetApp here.
The 3270 wasn't using Flash Cache at all, we just wanted to have a result without it.
The 3210 and 6240 both were, and it helps with latency a lot as you can see.
What is also important: All the NetApp configs were a fraction of the price of the EMC ones and provided many times the amount of space.
More details here http://bit.ly/awIYXz and here http://bit.ly/bJZpRD
- Twitter: La la la, we have not heard of any NUDE JLaw, Upton SELFIES
- China: You, Microsoft. Office-Windows 'compatibility'. You have 20 days to explain
- Apple to devs: NO slurping users' HEALTH for sale to Dark Powers
- Is that a 64-bit ARM Warrior in your pocket? No, it's MIPS64
- Apple 'fesses up: Rejected from the App Store, dev? THIS is why