back to article Apollo 13 set off into space 50 years ago today. An ignored change order ensured it did not make it to the Moon...

Apollo 13 was launched 50 years ago today. Now regarded as a "successful failure," the story of the aborted Moon landing began years earlier, with the design of mankind's then most advanced spacecraft. The post-launch explosion that to led what was arguably NASA's finest hour began, according to former director of Flight …

  1. Tom Paine

    Perrow

    Normal Accidents by Charles Perrow was also useful in researching this article,

    +1 for "Normal Accidents", one of the best books on IT failures I've ever read, especially as it's about complex systems in general rather than digital computers in particular (though they make a few appearances in passing.) I once worked at a shop with an office bookshelf staff were encouraged to contribute to: I bought two (used) copies of Normal Accidents for it -- used, because - inexplicably - it's out of print. Can't recommend strongly enough.

    1. Anonymous Coward
      Anonymous Coward

      Re: Perrow

      Perrow makes some absolutely vital points. One is that every system made by humans contains a human element, which is at least as fallible as any other part of the system. For instance, the contractor who was responsible for holes in the concrete containment shield of a nuclear power station - at least one of which was large enough to park a car in.

      Another of Perrow's thoughts that stays in the mind for decades is that complex systems are usually able to continue working in spite of one or more failures. It may take several independent failures to cause an accident serious enough to be noticed.

      I don't recall Perrow saying this, but it seems to me that really reliable systems require an attitude of complete and utter commitment to high quality. That is incompatible with the profit motive. (Compare, for example, standard commercial software such as that sold by Microsoft with the Space Shuttle software as described in Charles Fishman's superb article "They Write the Right Stuff" https://www.fastcompany.com/28121/they-write-right-stuff).

      In Robert A Heinlein's famous short story "Blowups Happen" (published in 1940 - not a typo), he depicts the problems of generating electric power from nuclear fission. The core of the story revolves around the difficulty of keeping the highly-trained and conscientious engineers who tend the power station sane. They worry so much about their awful responsibility and the consequences of any error or oversight that they have to be replaced after weeks or months, and need constant psychiatric help.

      Contrast that with how reality turned out! Instead of brilliant, dedicated, careworn geniuses going slowly mad under the unbearable burden of responsibility, we have had nuclear power station accidents due to appalling laziness and negligence, and sometimes even deliberate sabotage to alleviate boredom.

      In Perrow's terms, Heinlein assumed the availability of staff like himself, when such people are actually much rarer and harder to find. He specified human components of a quality that seems unobtainable on the market.

      1. Robert Grant

        Re: Perrow

        I don't recall Perrow saying this, but it seems to me that really reliable systems require an attitude of complete and utter commitment to high quality. That is incompatible with the profit motive. (Compare, for example, standard commercial software such as that sold by Microsoft with the Space Shuttle software as described in Charles Fishman's superb article "They Write the Right Stuff" https://www.fastcompany.com/28121/they-write-right-stuff).

        TL;DR: waterfall works if you have unlimited money and an insanely predetermined scope.

    2. richardcox13

      Re: Perrow

      Looks like it now in primt... with a revised edition published in 1999. Also a Kindle edition is available.

  2. IceC0ld

    Lucky 13

    I watched the drama unfold on the TV, a 12 yo, with zero understanding of how they had actually managed to do any of the rocket stuff, but absolutely mesmerised by it all since Apollo 8 and Christmas 1968 when they had orbited the moon, have watched the movie, and I still got goosebumps when things went wrong.

    Now, reading this and discovering just how LITTLE a failure can cascade into one almighty clusterfuck, it is sort of mindblowing just how hard they worked to get the crew back alive, makes you wonder what else occurred that caused loss of life through tiny errors that were compounded by ongoing acceptance of the faults ?

    what was that quote ? if the Saturn V was 'only' 99% good, it would still mean there were 000's of dud parts in the beast

    to ME, the space exploration game IS important, but I can also see that people may call it out as there are still people within the USA without the most basic requirements for the most basic of a reasonable survival, but I would hope that if we DO finally gain access to the STARS, that it WILL make a difference here on earth / home

    FFS, FIFTY years, feels like it was only a few years ago, wondering what it would be like in the 21st Century, now we are 20 years in, and I, for one, wouldn't mind going back there right now :o)

    1. Corwin_X

      Re: Lucky 13

      As an engineer I can tell you it's never the big stuff that's had a lot of spadework put into it - it's always the tiny little things missed that'll bite you on the ass. Challenger for example - a single insualation tile knocked off.

      1. The Dogs Meevonks Silver badge

        Re: Lucky 13

        Close, Challenger was down to a failed O ring on the fuel system for the boosters, Columbia was down to falling debris on launch that damaged the insulation on the wing.

        1. Corwin_X

          Re: Lucky 13

          Yeah sorry - should have said Columbia. Point still stands though: both were caused by tiny little failures in the systems. Hope Elon Musk has more luck!

          1. Gene Cash Silver badge

            Re: Lucky 13

            Elon doesn't really rely on luck. He designs with adequate margin then tests the living hell out of things.

            1. Corwin_X

              Re: Lucky 13

              That's what NASA was doing - or at least thought they were doing - and it still went pearshaped. Many people don't have enough respect for a) Murphy's Law and b) Plan B.

        2. Anonymous Coward
          Anonymous Coward

          Re: Lucky 13

          Columbia was done in because they had to change the formula of the foam used to insulate the main tank, in order to make it more "environmentally friendly". The change made it more liable to holding moisture and less able to retain structural integrity.

          1. awavey

            Re: Lucky 13

            not really the case at all, foam had fallen off the external tank since the Space Shuttles first launch, coincidentally was 39 years ago this past weekend, and what should have then been recorded as a fault, that got fixed and the Shuttle grounded till it was fixed properly, fell into launch & mission fever and accepted process as just one of those things that happened during a launch, and as launches happened successfully they used that as a reason to continue to ignore the foam shedding problem when it kept happening, any of the prior 112 launches to the loss of Columbia could have resulted in the same outcome had a foam strike hit the leading edge.

        3. Martin Gregorie

          Re: Lucky 13

          Errr, no. The O-ring that caused the Challenger crash was nothing to do with any fuel system. Instead it was an essential structural part of one of the two Solid Rocket Boosters. Its job was to seal the joint between two of the stacked steel cylinders that formed the SRB's structure.

          The whole assembly was cold-soaked overnight and then launched with the air temperature still below the minimum permitted launch temperature. As a result the O-ring was rigid, rather than resilient, and so was unable to do its job of keeping high pressures inside the rocket casing. This let extremely hot gasses escape, eventually in a strong enough jet to cut through the struts holding the SRB in position.

          At that point the SRB pivoted on its remaining attachment, punching a big hole in the external fuel tank and sealing the fate of the crew: there was no emergency crew exit on Challenger.

          The launch decision was a classic management error, not helped at all by the perceived pressure of "its live on TV - we can't mess up their schedule or disappoint the viewing public" and compounded by "we've launched below minimum temp before, so nothing can possibly go wrong this time".

          There';s a very good, and readable, account of the subsequent investigation and accident analysis in Richard Feynman's book "What do YOU care what other people think?".

          1. Corwin_X

            Yeah - engineers can and do make mistakes. But if a serious $hitshow happens it's usually the management! ;-)

          2. Corwin_X

            Re: Lucky 13

            You're obviously a steel eyed rocket man! ;-) :-)

          3. Anonymous Coward
            Anonymous Coward

            Re: Lucky 13

            "we can't mess up their schedule or disappoint the viewing public" and compounded by "we've launched below minimum temp before, so nothing can possibly go wrong this time".

            NASA did a statistical analysis of the relationship between partial SRB O-ring failures (they were a common occurance before Challenger LOVC) and air temperature at launch - there's a pretty little graph in the Rogers Commission report into the accident. Unfortunately they got the analysis wrong and concluded that there was no relationship - statistical risk analysis is hard at the best of times, and even harder when you're dealing with novel rocket engineering technologies due to the sheer number of unknowns.

            NASA and Thiokol management (Thiokol manufactured the shuttle SRBs) then compounded the error by over-ruling a Thiokol engineer who was advising against an early morning launch due to the unusually low air temperatures on the 28th of January.

            To paraphrase one of the resulting Rogers Commissions recommendations, when an engineer advises you not to launch because of unacceptable levels of risk, you immediately stop what you're doing, you shut the fuck up, and listen very, very, very hard.

            1. Yet Another Anonymous coward Silver badge

              Re: Lucky 13

              A statistical analysis along the lines of "I never wear a safety harness" / "I've driven drunk before" and nothing has ever happened to me

            2. Anonymous Coward
              Anonymous Coward

              Where's the modern equivalent?

              http://www.feynman.com/science/the-challenger-disaster/

              "Feynman was always the inquisitive type; he had to have the facts. To find out what happened to the shuttle, he went straight to the people who put the shuttle together. He learned many things from these people that would help him to discover the cause of the explosion; and also information that helped him realize what a risky business flying a shuttle really is.

              NASA officials said that the chance of failure of the shuttle was about 1 in 100,000; Feynman found that this number was actually closer to 1 in 100. He also learned that rubber used to seal the solid rocket booster joints using O-rings, failed to expand when the temperature was at or below 32 degrees F (0 degrees C). The temperature at the time of the Challenger liftoff was 32 degrees F."

              and lots similar elsewhere. Please read some of them, people's lives depend on you reading them.

              https://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogers-commission/Appendix-F.txt

              "It appears that there are enormous differences of opinion as to the probability of a failure with loss of vehicle and of human life. The estimates range from roughly 1 in 100 to 1 in 100,000. The higher figures come from the working engineers, and the very low figures from management.

              What are the causes and consequences of this lack of agreement? Since 1 part in 100,000 would imply that one could put a Shuttle up each day for 300 years expecting to lose only one, we could properly ask "What is the cause of management's fantastic faith in the machinery?""

              1. Yet Another Anonymous coward Silver badge

                Re: Where's the modern equivalent?

                >What is the cause of management's fantastic faith in the machinery?""

                For the public to back this they must believe the failure to be 1:100,000

                We want the public to back this

                Therefore we believe the failure rate to be 1:100,000

                1. Anonymous Coward
                  Anonymous Coward

                  Re: For the public to back this...

                  Feynman again (and again in the context of a Shuttle inquiry report):

                  "For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled."

                  https://www.quora.com/What-did-Richard-Feynman-mean-when-he-said-You-cant-fool-nature (and various others)

              2. Doctor Syntax Silver badge

                Re: Where's the modern equivalent?

                "He learned many things from these people that would help him to discover the cause of the explosion; and also information that helped him realize what a risky business flying a shuttle really is."

                One of my favourite bits of that was about the difficulty of locating the opposite pairs of holes when trying to correct the shape of the sections. He suggested painting marks at 90 degree intervals to make it easier and was told it would be too expensive. Not too expensive to paint the marks but too expensive to change the paperwork. The "quality" management system was an impediment to quality.

            3. Chris G

              Re: Lucky 13

              'Shut the fuck up and listen' was written into CAA rules when I worked on light aircraft air frames.

              I was a very junior engineer but if I wouldn't sign off on a checklist the aircraft couldn't fly until I did.

              As far as I know that was the law. What I do know, is twice in three years I refused to sign off although an owner was creating merry hell about flying to Boulogne for lunch or something because I had genuine concerns about cracks or wear in a part and my boss stood by me.

              1. Doctor Syntax Silver badge

                Re: Lucky 13

                flying to Boulogne for lunch

                "You can fly to somewhere this side of Boulogne and become lunch" might be sufficiently persuasive.

    2. I Am Spartacus

      Re: Lucky 13

      Like you, I was glued to the TV and my little transistor radio. I had watched as Apollo 8 went around the moon in wonder and awe. I woke up at some ungodly hour to watch Armstrong and Aldrin walk on the moon. I was a child of the space race and it never, ever gets tedious.

      I just rewatched the movie, obliviously knowing the ending, but never getting bored by it.

      And yes, I highly recommend both of Kevin Fong's podcasts "13 minutes to the moon". The testament of those who are played out on the screen is fantastic, chilling and inspiring.

      1. I ain't Spartacus Gold badge
        Happy

        Re: Lucky 13

        Kevin Fong is a selfish bugger! Being an expert in emergency medicine, he’s buggered off to help deal with the Coronavirus crisis, and not finished the final episode of the podcast.

        How am I going to know if they got them back to Earth safely or not?

        Seriously, I think the Apollo 13 series is even better than the moon landing one. Top work!

        1. deadlockvictim

          Re: Lucky 13

          If you look hard enough, there are spoilers online. But without the Hans Zimmer background music.

  3. Corwin_X

    A masterful analysis - bravo. Looking forward to part two!

  4. Vulch
    Boffin

    Cancellations

    Strictly speaking it was Apollos 15 and 19 were the second batch of cancellations, with 16-18 being renumbered to become 15-17. In the original schedule Apollo 16 was going to be the first to carry an LRV so the less capable all-walking version of 15 was the one that got the chop.

    1. Gene Cash Silver badge

      Re: Cancellations

      And Saturn V production had been canceled in August 1968, over half a year before even landing on the Moon, and before even Apollo 7. This meant Apollo 20 was canceled so its Saturn V could launch Skylab (AKA the Apollo Applications Project workshop)

      https://www.nasa.gov/feature/50-years-ago-nasa-cancels-apollo-20-mission

  5. keith 9
    Thumb Up

    13 Minutes To The Moon

    This podcast is absolutely epic. Particularly season 1

    Kevin Fongs voice is especially mellifluous, so perfect to get off to sleep to (and then catch up with missed bits the next day)

    1. Yet Another Anonymous coward Silver badge

      Re: 13 Minutes To The Moon

      Particularly the 15mins when the crew-cut "failure is not an option" types in the control room try and convince themselves it is a faulty gauge - after their astronauts have told them they felt a massive bang.

      An explosion would be an inconceivable disaster, therefore we can't conceive of an explosion. Looks like nothing changed for the Space Shuttle program

      1. I ain't Spartacus Gold badge

        Re: 13 Minutes To The Moon

        That’s not really fair though. For example the guy who worked out the Apollo 12 fault did it by saying, all these things can’t go wrong at once, therefore perhaps it’s faulty readings caused by a lightning strike? And they fixed it with one setting change.

        Diagnosing multiple failures in independent systems is hard, and sensor error is one probable cause that needs considering. There reaches a point where it’s almost more likely, because physical damage sufficient to break so many systems is likely to destroy the spacecraft.

  6. Sebastian.Q.Ostragoth

    that movie with Tom Hanks

    Amusingly the DVD of the movie has a second audio track featuring a commentary by Jim Lovell and his wife, of which there is no mention on the DVD box. A mistaken omission I'm sure, but well worth a listen.

  7. Version 1.0 Silver badge
    Angel

    Free Publicity for Major Tom

    Apollo 13 did wonders for David Bowie (icon) after the BBC stopped playing Space Oddity because it seemed to be describing the problems in space. Bowie was popular with his fans before that point but not everywhere, he was seen as "weird" but not a hit artist. However once the BBC "banned" the song from the radio, everyone wanted to hear it and then spent a long time trying to figure out the lyrics.

    1. Terry 6 Silver badge

      Re: Free Publicity for Major Tom

      There's a bit of history to the BBC banning songs into fame.

      Arguably it worked for pirate radio too. If they'd tried to compete and not prevent off shore broadcasters they'd probably have seen them off, or at least left them with just a niche market.

      1. Pangasinan Philippines

        Re: Free Publicity for Major Tom BBC

        Your quote "If they'd tried to compete" would have been true except that the musicians' union imposed strict "needle time" on broadcasts and the pirate radio formula of playing discs all day would not have been allowed.

  8. Anonymous Coward
    Alien

    Well, NASA really needed to name anything...

    ... Odyssey?

    Did they read the original work?

    Just kidding, of course. But coincidences are sometimes astounding.

  9. John Smith 19 Gold badge
    Unhappy

    "So why do we need a change control tracking system again?"

    Two words. "Apollo thirteen."

    Note. Back then the change tracking was probably done manually IE with clerks*. Not we've got a voltage change that was actioned,combined with a thermostat change that was not.

    *Today it's a standard module (or it should be) in any serious ERP system.

    1. Anonymous Coward
      Anonymous Coward

      Re: "So why do we need a change control tracking system again?"

      Back in the days of Apollo 13, and fior quite a few years after, didn't lots of high-value high-tech stuff (components, equipment, etc) have an ECO/FCO record **on the item itself**?

      Gear gets an ECO/FCO, ECO/FCO label is updated, if the ECO/FCO don't show, the flight doesn't go (and vice versa). (NB this wasn't just for stuff that flies).

      I'm not misremembering the principle.

      1. David Pearce

        Re: "So why do we need a change control tracking system again?"

        It was the norm. Overlooking the change in DC supply and the affected components is a remarkably obvious error. Experienced engineers know that breaking DC power is a minefield.

        I heard that the shuttle booster O rings was a consequence of US pork barrel contracts forcing the boosters to be built in sections for transport. Building in sections has to be significantly heavier than a simple tube

      2. My other car WAS an IAV Stryker

        Re: "So why do we need a change control tracking system again?"

        "...the pad power supply was upped from 28 to 65 volts, which required a special harness."

        As a sparky (electrical engineer), I've learned that if the lab/shop supply and the equipment have different interfaces, especially on the power line, they're probably NOT compatible and you should NOT attempt to adapt them without double checking.

        And as part of said double check, I would have noticed: "Oh, the spec was changed recently. Hey supplier, did you comply with this? No? Then I better use the lower power supply. Thanks!"

        Any test kit I built -- or spec'd out for the crack team of harness fabbers -- was designed to be compatible with the existing hardware past and present to support both the vehicles in the field AND the upgrades (prototypes being field tested before full production). The most useful tools are the ones you design yourself.

  10. LenG

    Why O-rings

    I'm working from memory here, so please correct me if I am wrong, but the original idea for the solid fuel boosters (themselves spec'd as they were cheaper than more controllable liquid fuel devices) envisaged a one piece construction with the boosters delivered by barge. Unfortunately the contract was awarded to Morton Thiokol and you can't barge something from Utah so the design was changed to allow air freight. How far back do you really want to trace this accident?

    1. Yet Another Anonymous coward Silver badge

      Re: Why O-rings

      Famously to the width of a horse's arse

      1. Anonymous Coward
        Anonymous Coward

        Re: Why O-rings

        Well, twice the width of a horse's ass, IIRC.

        So if Roman horses were fatter, maybe Challenger wouldn't have happened?

  11. Stuart Halliday
    Flame

    WTF

    How did the voltage change from 28V to 65V and no one noticed that the switch component couldn't withstand the increased current?

    This is basic stuff.

    1. Tony W

      Re: WTF

      I find the narrative of the thermostat contacts a bit strange. If the problem was increased current, why did the current increase? A higher voltage would require a lower current to do the same job.

      But the increased voltage, rather than current, caused the contacts to melt, it must have been because of arcing. In that case the design must have been perilously near the limit even for the lower voltage.

      Was the power DC? I'm not sure if the snap action bimetallic heat sensitive switch (as used in all kettles) had been invented by then, so maybe designing DC thermostats to avoid arcing would have been a problem. That makes the story more comprehensible, but running so near the limit doesn't look like good design. The obvious answer would be to use a low current thermostat to control a relay.

      1. Hubert Cumberdale Silver badge

        Re: WTF

        V=IR. If R stays the same and V goes up, so does I.

        1. Anonymous Coward
          Anonymous Coward

          Re: WTF

          Ohm's law indeed. Anyway, I'm no rocket engineer, but as an electrician, the first word that pops into my head is 'fuse'. If the wiring harness got up to 1000 degrees F (that's 500+ C) why didn't some fuse give way?

          1. IGotOut Silver badge

            Re: WTF

            You're presuming there are fuses . Fuses are designed to protect from overload. If you are designing to very high standards, in theory you shouldn't need a fuse. In addition to this, a fuse blowing, could be just a catastrophic as having no fuse at all.

            When you are traveling to space, every gram counts.

            1. Yet Another Anonymous coward Silver badge

              Re: WTF

              Also if a fuse blows inside the oxygen tank after launch, all you have to do is pull over to the side of space and take the tank apart to replace it.

              Interesting story:

              One of the power supplies for the original European camera on Hubble had a front panel fuse.

              Why everybody asked? - Who is going to replace the fuse ? (this was before resetting polyfuses)

              Because the PSU is being built by a small european country and required approval by their aviation agency who said that a flight PSU must have a fuse !

              It was subsequently replaced by a PSU built by a competent larger European company.

              The fused unit probably sits on a shelf in a clean room to this day

              1. Anonymous Coward
                Anonymous Coward

                Re: WTF

                Agreed to all of the above, but this was a pre-launch procedure from what I get out of the article.

        2. Anonymous Coward
          Anonymous Coward

          Re: WTF

          No, this is why vehicle supply voltages have been creeping up, 400V is common in pure electrics now.

          For a given power, as voltage goes up and current comes down. This saves weight in the wiring loom. The snag is that you need solid state power switching, breaking 400V dc with a mechanical contact is hard.

          1. Hubert Cumberdale Silver badge

            Re: WTF

            At least you shouldn't get arcing in the vacuum of space – that might make using a physical contact more plausible. But remember, with great power comes great current-squared-times-resistance.

  12. John Smith 19 Gold badge
    Unhappy

    Fun fact. #1 result from JPL analysis of "lessons learned" in payload design.

    Power supply issues.

    Too much / too little voltage and/or current.

    Startup transients (inrush current). Shutdown transients

    Incompatible connectors.

    And on. And on. It seems everyone takes the PSU for granted. Unwise as it turns out.

    1. Yet Another Anonymous coward Silver badge

      Re: Fun fact. #1 result from JPL analysis of "lessons learned" in payload design.

      It's always connectors or power supplies

  13. Annihilator

    After the fact

    What blows me away is the ability to deduct what happened after the fact, when all the physical evidence was inaccessible - the problems were all in the service module, which gets ditched as soon as they're back in earth's orbit and subsequently burns up in the atmosphere.

    Incredible science just to figure all that out.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like