RIM: 'Faulty switch took out faulty-switch-proof network'

Some time ago I was developing on a Stratus box that shared a server room and mains connection with a Tandem Non-Stop box. Both are fault tolerant machines.

We came in one Monday to find that our Stratus was dead. It turned out that the Tandem PSU had shorted during the weekend which tripped out the mains, shutting down the 2nd half of the Tandem PSU and leaving the Stratus to run on its backup battery until that went flat after 3 hours or so.

Exactly the same thing happened again a month later, proving it was no fluke.

Moral: if the backup(s) aren't in different buildings which are connected to different substations and standby generators the system can't be considered fault tolerant - and still may not be due to other circumstances.


