1 post • joined Wednesday 30th June 2010 15:03 GMT
The Navy notwithstanding...
Today's CPUs are quite good at telling when they're doing something wrong. There are a lot of 'breadcrumbs' to make the proper determination. If a truly benign divergence happens in which both replicated halves are still machine-correct, then the one that has been running the OS and the applications the longest error-free is kept running. Experience has shown that these occurrences are rare, fortunately.
Stratus used to offer triple redundant servers as well, but field data and long-term analysis showed that the increased availability protection was negligible (in the noise, really) compared to the high increase in cost.
(disclaimer: I'm a design engineer at Stratus Technologies, so I do look at this data quite a bit over morning bagel and OJ) :-)