"Distributed" RAID on 3PAR systems
I've been a customer of 3PAR for most of 2007, and one of the things that I really like about their systems is the "virtualized" disks. The physical disks are split up into 256MB chunks, and the RAID configuration is on the chunks, rather than the disks. Each disk has reserved chunks set aside for redundancy purposes, on my 300GB FC drives it is ~18GB. Long story short, when a drive fails, the entire array(all spindles, depending on the size of the volume, a 10GB volume will be spread across up to 39 disks) participates in the rebuild process. There are no "spare" disks, all disks are in use. So rebuilds are real quick compared to traditional RAID systems.
Other advantages are things like being able to run multiple RAID levels on the same physical disks (I run RAID 1+0 with data on the outer regions of the disks and RAID 5+0 on the inner regions).
Of course it's also handy to be able to convert between RAID levels online, with no downtime.
But if you really want to protect against multiple failures you have this ability today in several higher end RAID systems, by mirroring multiple times, say two RAID 1 volumes, mirrored together. I've been told that some of the highest end HDS systems for example(the ones where they pay you for any downtime associated with the array) default to something like triple mirroring. If data protection is THAT important the user should be using such a system(assuming they want the data to be completely available vs replicating to a second array where there may be downtime in pointing servers over to it in the event of a failure).
My 3PAR array has a "mirror depth" option which may be the same thing, I'm not sure, haven't tried it. It also has the ability to automatically lay out the data so that (provided you have enough shelves) it can ensure redundancy in the event an entire shelf goes off line. My array is only 2 shelves so we can't take advantage of that, but I imagine it can save a lot of planning for folks that want that kind of assurance.
And of course while not completely risk free, running RAID 1+0 instead of say RAID 5 with a hot spare gives you a higher chance of surviving multiple disk failures(as long as they are the right disks). Especially say if your raid volume is made up of 10+ disks. The only time I've experienced multiple disk failures on the same array before I could replace the disks was back in 2001 I think with the IBM 75GXP disks. Had 4 drive failures in the span of 3 days on two different systems. The systems were dedicated 'backup' servers, so nothing important was lost, just rebuilt the system that lost it's marbles and re-synced the data. I must've lost at least 30 75GXP drives and even joined the class action lawsuit for a while until the judge decided to exclude people outside the state of California.
So, fast RAID rebuilds are here, and have been for years, at least with 3PAR, I'm not personally aware of other vendors that offer similar technology.
