dubious about this bit
"Fault tolerant servers are distinct from the more popular clusters in that they are two completely mirrored systems running two copies of an operating system and their applications are kept in absolute lockstep by a chipset and electronics in Intel's Xeon chips."
Unless someone corrects me, I don't buy this. Unless you could guarantee absolutely identical memory access patterns then there will be variations in cache access which will lead to different levels of cache being accessed which, given the huge differences in access time of different cache levels, would cause the system to run at the speed of the slowest - which could be very slow.
And then there's different disk access characteristics, so that even nominally identical units will have different request servicing times.
And there's interrupts. And network servicing times possibly acting as a bottleneck of different sizes per processor, moment to moment, both in network hardware differences and network load.
And then there's the circuitry on top to supposedly validate the outputs against each other (according to what, anyway, and what happens when a discrepancy is found?)
And this would be bad enough if it was multicore, single socket, but this supposed to have *dual sockets*.
I suppose big jobs will be main memory- rather than cache-bound, so that price is perhaps already paid (ie. it runs as if from main mem anyway, so cache bonus is effectively lost), but I must be missing something. Can anyone who actually knows about this stuff please enlighten me, ta.