Pillar Data Systems is topping off its lineup Axiom storage systems with a box supporting a healthy chunk more capacity and performance than its predecessor. The modus operandi of the Larry Ellison-funded unstart-up is storage efficiency — and this round it's slapping some guarantees on the new Axiom 600 disk array, promising …
They're a company to take seriously
We've run our business on 40TB of their storage for the past couple of years, NAS flavour. The hardware's bombproof: we haven't had so much as a spindle fail, such is the love they wrap them in within the Bricks, and the electronics have been noticeable only insofar as we haven't noticed them. The volume management and QoS is likewise invisible `just works' stuff, and I'd extrapolate from our NAS experience to say the SAN products are probably plug in, beat with a massive workload and forget.
The NAS product is a lightening-fast NFS server, but obviously has a much larger software stack covering filesystems and the upper-layer protocols. The NFS works perfectly, and hasn't crossed our radar since day one, and the filesystem performance, snapshotting and the like all does what you'd expect.
We've spent a bit of time chasing around a couple of NLM issues, but that's a protocol that suffers from poor specification and our environment --- interworking between Solaris and Linux clients --- has a history of stressing even the native Solaris implementations on past Suns and Auspexes (Auspex never `deep ported' lockd, instead running it on the Solaris host processor at hideously slow speeds). The Pillar version is running on the slammers and has performance and resilience to spare, but has some corner cases where it behaves differently to the reference implementations. But the fixes have usually been rapid, and it's not really got in the way of production loads. We're supporting multiple Solaris Oracle platforms (primarily a couple of Niagaras) off it, plus 1000 users of Cyrus IMAP, home directories for those 1000 users, a 200-user Clearcase environment, and it's barely raising a sweat.
I'd be lying if I said it was trouble-free, because nothing is and we're at the right-hand end of the complexity curve amongst their customers. But of the new products we've brought into our network it's been one of the easiest. And when we have had issues, the company is one of the most customer-focussed we've dealt with, with top-quality people at the end of the phone directly and short, effective escalation routes. The management team are straight-forward and decent and the price / performance is great. If you're shopping for storage, they're well worth looking at.
> 40TB of their storage for the past couple of years
> we haven't had so much as a spindle fail,
40TB a couple of years ago would be between 53 and 80 spindles. You're saying that in 2 years not one out of those 53-80 spindles has failed? Yeah, right. A (SATA) disk is a disk and the odds of you not having at least 1 failure are so high I can't be bothered to 1x10-40 it.
More likely the failure led on a few of the drives has also failed.
> The NFS works perfectly,
> We've spent a bit of time chasing around a couple of NLM issues
> but has some corner cases where it behaves differently to the reference implementations
> the fixes have usually been rapid
It works perfectly but you've spent time looking into problems with it and they've had to fix it because the implementation of NFS doesn't follow the spec.
> I'd be lying if I said it was trouble-free
> The hardware's bombproof
>multiple Solaris Oracle platforms (primarily a couple of Niagaras)
(primarily a couple of 3 year old CPUs in Sun Fire systems)? How big are the Oracle instances? What SGA are you using? How many IOPs are you driving from the instance to the storage?
> plus 1000 users of Cyrus IMAP
UNIX email implementations tend to be very kind to storage. E.g. ISPs have large scale qmail implementations supporting millions of users on not much hardware. Now if it were Exchange (2003) it would be more interesting but then again it's only 1000 users.
> home directories for those 1000 users
1000 users doing what though? If you are a hospital for example then 1000 users doing not really any traffic at all?
Then again it's just 1000 users. Or a few dedicated spindles. That's why Windows servers are up to the task.
> and it's barely raising a sweat.
From your workload I'm not surprised. I'd bet a Dell PowerEdge with lots of memory and 80-odd spindles wouldn't either. Now if only they could get the disk failure rates to 0...
And how much money has Larry lost so far? Paris?
``You're saying that in 2 years not one out of those 53-80 spindles has failed?''
96 spindles active, 104 once you include the hotspares. It's not that surprising: it equates to an MTBF of something over 200 years. The lower end of `server grade' SATA is often quoted as being a million hours' MTBF (that's the way Apple are sliding around the precise nature of the disk in the Time Capsule, for example).
A million hours is 114 years; that we should be getting a bit better than that in a machine room kept at 18 centigrade into enclosures that were powered on in 2006 and have never been powered off since is hardly surprising. Even if it is surprising, it's hardly the 10^-40 you suggest.
A cold machine room and stable power (batteries and a genny) makes quite a difference, by the way: we have around five hundred spindles in our server estate and we typically see under ten failures per year. This year so far we've had a new SAS disk fail within the first few months, a five year old FC disk in an EMC went and a ten year old 9GB drive in a Sun multi-pack decided to finally expire. The only drive that's failed in `normal' timescales was a two year old 36GB SCSI drive in a V240. So our experienced MTBF with the Pillar is better than our general experience, but not outside the bounds of expectation.
By the way, if you can find a spec of NLM (not NFS, not, but NLM) which actually covers the lock reclamation process in a way which is compatible with fielded implementations from Sun and the Linux community, could you let me know? NFSv3 works between vendors because there's a spec (RFC1813) which pre-dates released implementations. NFSv2 doesn't work well between vendors because there isn't a spec, and endless fun ensues if you (for example) attempt to do mutual exclusion by creating lock files: the differences between traditional Unix filesystem semantics and NFS semantics are a matter of experiment, and vary from implementation to implementation.
NLMv4 (associated with NFSv3) is a horror story because it's relegated to an appendix of RFC1813 which only describes the differences compared to NLMv3, which itself has a very limited semantic specification, and it implements the flock() mechanism which itself has subtle differences between vendors. Sun and NetApp quietly introduced and recommended llock, which you can use in a lot of scenarios and which completely bypasses NLM (ask NetApp why they recommend llock). Auspex simply used the SunOS/Solaris lock manager unchanged, and in their more successful boxes also used the SunOS filesystem. The Linux one was, last time I tried it, broken following client crash and restart.
Why? As the Wireshark page for it says, ``Keep in mind, there is no standard for the NLM protocol, the only thing that exists for this protocol is an interface specification describing the packet format. This is one reason why there have historically been so many problems with this protocol.''
I'm not going to fish around in our cacti instance for all the numbers, but as a rough guide our load is peaking at around 100MBytes/sec five-minute average of NFS activity, and over a (seven day) week we average about 25MB/sec. We peak at 4K NFS ops/sec, and average 1.5K/sec over the week. The usual response time reported by the clients (via sar and the like) is about 3ms, although write bound loads are slightly faster (going to mirrored RAM) and random reads are spindle-bound (as you'd expect).
Mine's the one with the ``I must not get engaged in fruitless discussions about fileservers.'' note in the pocket.