RAID failure cases are so ugly
And that's before you even take controller failures into account.
I really liked Figure 2 in the paper that shows how non-orthogonal the failure cases can be. By that I mean that with RAID you can't just give a fraction of tolerated disk failures, but have to consider clusters of worst-case scenarios. Their figure isn't for standard RAID, but you can still see how their scheme carries over the non-orthogonality of current RAID implementations.
I dabble in using Rabin's Information Dispersal Algorithm as an alternative to RAID. I've only recently added some introductory information to my repo to show what it is and how it can be better than RAID. In fact, one of the points I made was that failure analysis is a cinch with IDA. Since IDA doesn't distinguish between data and parity, your redundancy level is a simple fraction and if you know the failure rates of individual disks it's very straightforward to calculate the probability of failure of the cluster as a whole. You can obviously go nuts and apply a Poisson arrival model or use prior probabilities to examine reliability over time if you want to, but it's not necessary.
I'm constantly amazed at the number of researches that persist with the mentality that XOR-based data redundancy systems (ie, all RAID systems) are the way to go, and that non-orthogonal failure cases are acceptable. I get that XOR is cheap, but so is any other O(n) algorithm (like IDA) if it's done in hardware. And it's not even the case that XOR-based systems have to have non-orthogonal failure cases. There's a thing called "Online Codes" invented by Petar Maymounkov that follows from previous work on Raptor codes. It uses two layers of XOR and gives asymptotic (probabilistic) guarantees about the recoverability of the data with a given number of erasures. It might not be well-suited to use in storage systems (or it might, if anyone bothered to look into the maths), but it at least shows that orthogonality is possible. (I'm also in the process of implementing this in my repo since it should be good for multicasting a file across my network/storage array so that individual nodes can then do the IDA bit, along with allowing for other nested/hybrid IDA setups)
I'm actually reminded, as I write this of the whole debate around neural networks back when the "Perceptron" was discovered. Research into the whole field basically stalled for quite a few years because it was proven that the Perceptron couldn't encode an XOR rule. It wasn't until multi-layer neural networks were invented that this particular problem was overcome and progress started to be made again. I wonder if there isn't a similar artificial plateau effect happening these days with RAID systems?