Failure is not an option
Its a standard feature of any computer system.
For those who don't get the reference - see Failure is not an option
It's never a nice feeling with your computer keels over, wiping out work, sometimes requiring hours of maintenance and basically ruining your day. But spare a thought for the three astronauts currently in the International Space Station who discovered earlier today that one of the three computers in the station's Russian …
The computer which failed is a part of the ISS, and not a laptop.
The laptops interfacing with the ISS systems are Linux based, both the American and Russian ones. Then there are some European and Japanese laptops. The rest are running Windows:
Space:1999 - Opening windows on Moonbase Alpha.
To be fair there were a couple of scripted\deleted scenes regarding the replacement of the normal windows for ones that allowed them to be opened & refitting afterwards (Can't figure out why they had factory made replacements to hand, nor would I want to be testing the seals by being in Main Mission as the atmosphere leaked away again).
"We confirm that the so called program readiness of one of the three ISS computers was lost, in other words there was a program fault. In order to restore the computer to a working state, system reboot is required.
This fault will in no way affect ISS normal operation. The default cyclogram permits indefinite flight time using two available channels. To ensure reliable docking procedure with the Progress spaceship, the reboot will be performed on 8 Nov 2018."
"ISTR that there was some effort towards a film version of some sort, but don't know what came of it."
The books where a trilogy, but someone thought it would be best to make a trilogy out of each book, then someone pointed out that to be true to the source, things should be done in threes, so each trilogy had it's own set of sequels. Then they noticed the other three books. At this point the projected cost of making the films got so astronomical even the first Rama spaceships couldn't keep up. So they canned the entire deal.
Somewhere deep in Hollywierd, there are people thinking of trying a second time, then a third time, just to be consistent.
I worked for a NPP, where everything that related to critical gear was quadrupled, if possible. It's the philosophy:
"One working, One backup, One broken, One in maintenance."
Things like reactor cooling, switchgear feeding power to those cooling pumps, etc... Now THOSE you don't want to fail, as well. And the design was German, not Russian.
I'm pleasantly surprised they used triple redundancy, however.
'But, if one is broken and one is in maintenance....surely thats just single redundancy.'
That's pretty much how it works out in practice. My experience of TMR controllers is that when one unit fails, invariably the general reaction is not - OK we are just down one level of redundancy, we have time to investigate and plan. It's - What if another one fails! lets panic and make a poor decision.
Triplication of sensors gives less than triple redundancy in practice. Three simultaneous measurements of a single process parameter are never exactly the same due to general measurement principles i.e. there is always a factor of uncertainty. As a result, voted inputs are compared within bounds. On the complete loss of one sensor and any deviation between the remaining two exceeding bounds, which one is the correct value? Default reaction is - Shut it down!
It does little for resilience or redundancy to extend uptime significantly, at best it will prevent you operating on significantly false input data.
Biting the hand that feeds IT © 1998–2019