back to article 'Data saturation' helped to crash the Schiaparelli Mars probe

The European Space Agency (ESA) has released results of its early investigations into the crash of the Schiaparelli Mars probe and it sounds like software may have been a part of the problem. "A large volume of data recovered from the Mars lander shows that the atmospheric entry and associated braking occurred exactly as …

  1. Ashley_Pomeroy

    "Oh no, not again"

    1. Anonymous Coward
      Anonymous Coward

      Yeah, didn't we see something similar with Ariane 5?

      (Corr: not really, there sensors (correctly) reported an acceleration so strong an overflow occurred when converting from float-to-integer in useless Ariane 4 code left in the package, an overflow trap was raised (in violation of IEEE 754 default policy), a trap handler did not exist, software then trapped to an exception handler which proceeded to dump debugging information into a memory area used by motor guidance. Repeat with the backup computer. See also page 22 of How Java's Floating-Point Hurts Everyone Everywhere from 2004 by Numerical Computation Wrestler Prof. William Kahan et al.)

      1. Martin Gregorie

        Yeah, didn't we see something similar with Ariane 5?

        The problem with the Apollo 11 LM's onboard computer looks like a better match.

        There, leaving a docking radar on overloaded the computer's interrupt handler when they got near the lunar surface, but fortunately there was an astronaut on board who was able to manually fly the landing.

        Here, violent gyrations as the parachute opened seem to have overloaded the IMU and caused it to output garbage which upset the computer that managed the landing.

        A faster IMU and improved garbage detection and rejection would both seem like a good idea.

        1. placeforhandle

          Re: Yeah, didn't we see something similar with Ariane 5?

          "The problem with the Apollo 11 LM's onboard computer looks like a better match." - no, no, no and again no. You have done a terrible thing with your post - you should delete it!

          The story of the 1201 and 1202 alarms on the lunar module is a marvellous and wonderful story of how to do it *right*.

          https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html

          I commend it to all readers, it's an amazing thing to read.

          Then go listen to / watch the original footage and here Buzz(?) call out the alarms and feel the tension.

          1. swm

            Re: Yeah, didn't we see something similar with Ariane 5?

            As I remember (and I listened to this live) the error was not flipping a switch that would synchronize the radar measuring height from the lunar surface and the radar measuring the distance to the lunar orbiter. This switch was not thrown because of a documentation/checklist error. So instead of having about 30% free time the computer had only 2% free time. Whenever an astronaut attempted to query the system it ran out of real time and caused these alerts.

            But the software didn't crash - it just dumped some real-time processes so it could keep computing more important stuff. The fix was for the astronauts not to interrogate the computer. The final landing was manual after the computer had perfectly guided the lander to a particular place/velocity over the lunar surface. The tenseness during the landing (when ground control realized something was happening but didn't want to interrupt the pilot) was that everything had gone according to plan except that there were so many craters and the pilot was looking for a place to land. The media commentators did not have a clue during the whole landing.

      2. Arthur the cat Silver badge
        Facepalm

        an acceleration so strong an overflow occurred when converting from float-to-integer in useless Ariane 4 code left in the package, an overflow trap was raised (in violation of IEEE 754 default policy)

        If my very rusty memory isn't lying to me, Ada requires a trap on overflow, so IEEE 754 policy has nothing to do with it. The problem was caused by the Ariane 4 code saving two precious bytes of RAM and then nobody checking whether the variable was large enough for Ariane 5 conditions. There was no trap handler defined because "of course it can never happen, Ariane 4 can't accelerate that much". If they'd actually put a comment in to that effect, maybe someone would have noticed.

    2. Kane
      Coat

      "Oh no, not again"

      Actually, that was the bowl of petunias. But have an upvote anyway.

      Mines the one with the Sub-Etha Sens-O-Matic in the back pocket, thank you.

    3. Julifriend

      @Ashley_Pomeroy It was the bowl of petunias that said 'Oh no, not again'.

    4. energystar
      Boffin

      Seems this thing wasn't even AI tested...

      How Can a Real Time System Ignore Priorities? Hopeful More detailed Info get Public Later.

  2. Anonymous Coward
    Anonymous Coward

    Welcome to embedded system engineering

    I have seen it so many times, I no longer even laugh.

    Trying to explain the Postel principle to an embedded engineer is like trying to convince an evangelical fundie that the world is more than 6000 years old.

    YOU HAVE TO CHECK YOUR INPUTS. ALWAYS. IF THE INPUTS DO NOT MAKE SENSE DISCARD, RESET, REPENT.

    And do it again.

    There is nothing easier than validating location and altitude readings. Just compute f*** first derivatives. If it looks like you have accelerated at 10000G and have broken the light barrier you definitely got a duff reading.

    Sigh...

    1. John H Woods Silver badge

      Re: Welcome to embedded system engineering

      indeed. I'm not sure that there's any reason deploying parachutes at a negative altitude.

      1. Mark 85

        Re: Welcome to embedded system engineering

        One would think a negative altitude would cause a "reset" "input fresh data" sequence. I'm just surprised that it discarded the chutes and fired the retros as don't think the retros would have helped after impact.

        1. Sleep deprived
          Happy

          Re: Welcome to embedded system engineering

          Maybe it tried to fire the retros backwards to climb back to surface.

      2. Destroy All Monsters Silver badge

        Re: Welcome to embedded system engineering

        I'm not sure that there's any reason deploying parachutes at a negative altitude.

        NEVER SURRENDER!

      3. Blofeld's Cat
        FAIL

        Re: Welcome to embedded system engineering

        " I'm not sure that there's any reason deploying parachutes at a negative altitude."

        I believe Wile E Coyote has done this on numerous occasions.

        It appears to be the standard failure mode for parachutes according to the rules of comedy. The parachute then floats down and completely covers the lander-shaped hole in the Martian surface.

        1. Anonymous Coward
          Anonymous Coward

          Re: Welcome to embedded system engineering

          You mean ESA main supplier is ACME?

        2. You aint sin me, roit

          Don't look down!

          As Wile E Coyote regularly finds out to his peril, you're OK walking off the cliff... until you look down.

          At which point your Inertial Measurement Unit gets saturated...

      4. julianh72

        Re: Welcome to embedded system engineering

        Re: Deploying a parachute at negative altitude:

        This happens just about every day to Wile E. Coyote: pursues Road Runner, falls off cliff, tugs desperately at rip-cord, hits the ground, making a coyote-shaped hole - and then the parachute pops out of the ground, and settles gracefully down over the coyote.

        I'd like to think that is how Schiaparelli's last moments transpired!

      5. This post has been deleted by its author

      6. This post has been deleted by its author

        1. Destroy All Monsters Silver badge
          Windows

          Re: Welcome to embedded system engineering

          From: http://www.nytimes.com/1985/06/20/us/laser-test-fails-to-strike-mirror-in-space-shuttle.html

          "Several critics quickly cited the failure as evidence of bigger problems to come. They said this mistake, a simple human error capable of upsetting a complex technological effort, was the type that could be the ultimate undoing of the proposed antimissile shield."

          Yeah, I remember when the news was that one could get a functional SDI infrastructure for, like, 1 trillion dollar. Which is, what, 1/42th of the national debt now (or 1/200th by some reckonings)? And the only thing that does even work 30+ years later is anti-Iranian missile bases in Romania (more like first-strike caps on Russia, amIrite?)

          The retardation and belief in technological marvels was just amazing.

      7. Anonymous Coward
        Anonymous Coward

        Re: Welcome to embedded system engineering

        > I'm not sure that there's any reason deploying parachutes at a negative altitude.

        Oversimplified code?

        if (altitude < 3000)

        deploy_chute()

    2. Steve K

      Re: Welcome to embedded system engineering

      Unfortunately it is definitely an embedded system now - in the Martian surface...

      1. gregthecanuck
        Pint

        Re: Welcome to embedded system engineering

        Hey Steve - thanks for the laugh. You win LOL of the day.

        Cheers!

  3. Anonymous Coward
    Anonymous Coward

    I would never fly anything ESA designed (if they ever get to that stage) if they let this sort of thing completely screw up their system.

    How friggin fragile do they design their software?

    Don't they test with various sensor fail (or temporary fail) issues, as well as take bad readings into account even during the initial design process?

    Just embarrising for ESA.

    1. SkippyBing

      I wonder if they borrowed their engineers from Thales, makers of a drone that can think it's on the ground if it's a bit cloudy. Certainly displays the same lack of creative thinking when it comes to failure modes.

    2. Lars Silver badge
      Joke

      "Just embarrising for ESA". Yes, and the solution is, not doubt, for the UK to pull out of the ESA to show how to do it proper. What the hell are you waiting for.

      1. Phil O'Sophical Silver badge

        In this case the British had been there, done that, 12 years ago. Possibly even less dramatically, see: https://www.newscientist.com/article/2112484-beagle-mars-probe-probably-didnt-crash-new-analysis-shows/

        1. Anonymous Coward
          Boffin

          On the other hand it didn't actually return any data. I realise it must have been a success because it was british, but you need a definition of 'success' which includes 'failure'.

    3. Anonymous Coward
      Anonymous Coward

      It's not outside the realms of possibility that both the IMU and the navigation system performed to within the written system specs.

      I don't know about software, but from a mechanical engineering point of view, specifications for spaaaaace engineering are very tightly controlled and a lot of written and test evidence has to be supplied to show that the specification is met.

      Difficult to believe, but a lot of engineers can actually say their part of the lander design was a success.

  4. Anonymous Coward
    Mushroom

    Seems like the system could have been a little more roboust

    So a disagreement in data between sensors for 1-2 seconds cause the parachute to be discarded and the retros to stop firing?

    Even if the computer made a momentary calculation that the probe had landed, wouldn't it be safer to reduce thrusters and retain the chute until landing had been confirmed?

    1. Anonymous Coward
      Anonymous Coward

      Re: Seems like the system could have been a little more roboust

      No, the parachute must be discarded earlier so it doesn't risk to cover the probe, and thrusters must also be turned of before landing if you don't want to heat, blow and contaminate the landing site.

      That said, as others pointed out, the software should have been able to cope with unexpected readings - they had the NASA example when the vibrations caused by the deploy of the landing legs triggered a reading as if the probe has landed - and the subsequent engine cut-off.

      But they wouldn't be the first developers who don't understand the need to cope with unexpected situations... data never lie, right?

      1. SkippyBing

        Re: Seems like the system could have been a little more roboust

        I would have thought a simple timer would have solved a lot of problems, the time to fall to the Mars surface must be trivially easy to calculate for a space agency. There's no need to even start thinking about the ground until you're within a tolerable margin of that time and then you can start looking for it with a radar altimeter.

        1. Anonymous Coward
          Anonymous Coward

          Re: Seems like the system could have been a little more roboust

          I'm slightly surprised it's done using sensors at all. Considering how good boffins are at calculating things like you say - time to fall, it seems odd to me that it's not all done on a timer.

          1. Anonymous Coward
            Anonymous Coward

            Re: Seems like the system could have been a little more roboust

            "Considering how good boffins are at calculating things like you say - time to fall, it seems odd to me that it's not all done on a timer."

            As I recall, the Russians did actually use drum timers for a lot of things on their spacecraft.

            Many years ago I designed a system which carried out a number of interlocks and tests before engaging an array of instruments and then hitting something with a most enormous zap and waiting to decide if it was necessary to pull the main breakers before uncontrolled rapid ignition of the test system happened. The whole thing was software controlled but, in parallel, I had an old fashioned drum timer with microswitches which was able to abort the whole sequence at each phase if critical conditions were not met. Because sooner or later after I left somebody was going to try to alter the program. The drum timer lived in a transparent plastic box so it could be inspected before each run. It is difficult to do this with software.

            1. Destroy All Monsters Silver badge

              Re: Seems like the system could have been a little more roboust

              Drum timers: Will survive axially-placed gamma-ray bursts in your galaxy. 100% guaranteed!

              Galactic Equip-R-Us. Call now by Ansible!

        2. a_yank_lurker

          Re: Seems like the system could have been a little more roboust

          @SkippyBing - There is a tendency to be overly complex when something simple will work much more reliably. The physics for the descent are well known and the calculations should be doable by an undergraduate without a computer. An example when KISS should be remembered.

        3. Anonymous Coward
          Boffin

          Re: Seems like the system could have been a little more roboust

          Yes, solving a great mass of fluid-dynamics equations in advance, when you don't know what the atmosphere will be doing that day or the exact details of the velocity and position of the spacecraft as it enters the atmosphere or the topography of the ground where it will end up is easy. That's why the Apollo LEM, which didn't have the problem of atmosphere to deal with didn't need all that landing radar stuff. Oh, wait, it did need landing radar.

  5. Anonymous Coward
    Anonymous Coward

    Dynamic analysis...

    Appears to have been forgotten or been inadequate. There are loads of software tools out there that could have helped spot this sort of issue.

    You would have thought that everything would have been thrown at a system that's going to fly so far before it gets run "live"...

  6. Dwarf

    Bravo El Reg for the HHG reference - Perfect, unlike the code on the lander.

    I'm left wondering how time sensitive the "getting your kit out" stage is on landing. I would expect that it would at least court the ground its just landed on for a bit before deciding to whip it all out.

    After all, if anything unlikely were to happen on landing, then all the sensitive bits are still packed away nice and safe and out of harms way.

    1. Simon Sharwood, Reg APAC Editor (Written by Reg staff)

      Thanks Dwarf. You can stay and comment again :-)

  7. Anonymous Coward
    Anonymous Coward

    Its called Failure Modes and Effects Analysis....

    try it sometime. You'll like the results.

    1. Destroy All Monsters Silver badge
      Windows

      Re: Its called Failure Modes and Effects Analysis....

      As I recall, FMEA and/or FMECA or similar analysis are required on any spacegoing systems at ESA, you can be sure this wasn't dropped on the floor.

      Now, according to Jimbo's Patent Entry On Fault Tree Analysis, FMEA is a bit problematic?

      Early in the Apollo project the question was asked about the probability of successfully sending astronauts to the moon and returning them safely to Earth. A risk, or reliability, calculation of some sort was performed and the result was a mission success probability that was unacceptably low. This result discouraged NASA from further quantitative risk or reliability analysis until after the Challenger accident in 1986. Instead, NASA decided to rely on the use of failure modes and effects analysis (FMEA) and other qualitative methods for system safety assessments. After the Challenger accident, the importance of PRA (Probabilistic Risk Assessment) and FTA (Fault Tree Analysis) in systems risk and reliability analysis was realized and its use at NASA has begun to grow and now FTA is considered as one of the most important system reliability and safety analysis techniques.

      Within the nuclear power industry, the U.S. Nuclear Regulatory Commission began using PRA methods including FTA in 1975, and significantly expanded PRA research following the 1979 incident at Three Mile Island. This eventually led to the 1981 publication of the NRC Fault Tree Handbook NUREG–0492, and mandatory use of PRA under the NRC's regulatory authority.

      NUREG-0492 (from 1981, 209 pages) can be obtained from yonder page or apparently for ~12 Brexit Pounds from amazon.co.uk (someone at amazon.com sells it for USD $183.38 wut?). It's very readable.

  8. Anonymous Coward
    Anonymous Coward

    Life somewhat imitating Art

    Stanislaw Lem writes about a failed landing of a 1⁵ ton rocket on Mars in "Ananke" (More Tales of Pirx The Pilot)

    Such was the brain, so overburdened with spurious tasks as to be rendered incapable of dealing with real ones, that stood at the helm of a hundred-thousand-tonner. Each of Cornelius’s computers was afflicted with the “anankastic syndrome”: a compulsion to repeat, to complicate simple tasks; a formality of gestures, a pattern of ritualized behavior. They simulated not the anxiety, of course, but its systemic reactions. Paradoxically, the fact that they were new, advanced models, equipped with a greater memory, facilitated their undoing: they could continue to function, even with their circuits overloaded.

    Still, something in the Agathodaemon’s zenith must have precipitated the end—the approach of a strong head wind, perhaps, calling for instantaneous reactions, with the computer mired in its own avalanche, lacking any overriding function. It had ceased to be a real-time computer; it could no longer model real events; it could only founder in a sea of illusions… When it found itself confronted by a huge mass, a planetary shield, its program refused to let it abort the procedure, which, at the same time, it could no longer continue. So it interpreted the planet as a meteorite on a collision course, this being the last gate, the only possibility acceptable to the program. Since it couldn’t communicate that to the cockpit—it wasn’t a reasoning human being, after all—it went on computing, calculating to the bitter end: a collision meant a 100 percent chance of annihilation, an escape maneuver, a 90-95 percent chance, so it chose the latter: emergency thrust!

    It all made sense. Logical—but without the slightest shred of evidence. It was something unprecedented. How could he confirm his suspicions? The psychiatrist who had treated Cornelius, helped him, given him job clearance? The Hippocratic oath would seal his lips, and the seal of secrecy could be broken only by a court order. Meanwhile, six days from now, the Ares…

    1. arctic_haze

      Re: Life somewhat imitating Art

      You beat me to it. Yes, Ananke was the story which came out of my deep memory the moment I read the article.

      Mars - tick; crash landing - tick; overloaded computer - tick; bad programming - tick.

      The only difference was that the automatic ship Ananke was landing on Mars while we already had bases over there.

    2. energystar
      Gimp

      Re: Life somewhat imitating Art

      The computed massively overloading, contradicting the sensed. Paranoia Mode.

      Beautiful Reference.

  9. Sampler

    Sci-Fi Story

    Wouldn't that make a great Sci-Fi story, the opening sequence half of a major city (New York say, because it's always America) get's destroyed by an alien weapon that impacts and reduces the place to a crater, ~1.6 million wiped out in a second.

    When the next wave turns up in orbit we nuke 'em to hell and the war begins in earnest, man vs invader.

    Only, in the M.Night Shyamalan twist at the end, it wasn't a weapon at all, just a probe to see what we humans were all about, but guidance fucked up and it planted itself in Central Park...

    1. Destroy All Monsters Silver badge
      Alien

      Re: Sci-Fi Story

      Nah, I would redo the ending with humanity having been taken over and euthanised by an evil collusion of pseudo-intelligent talkbots descended from Siri, Tay, Ms. Baidu et al. and badly programmed, insecure IoT devices.

      These proceed to kick the Alien's arse fiercely because, well, they are soulless killer zombie bots from hell.

      In the final scene, Clippy appears on the beleaguered Alien's Mothership Main Screen and goes: "Ding Ding! You seem to be trying to develop automatics. Do you want a little help?". Then everything is blown away.

      THE END!

      1. Mystic Megabyte
        Happy

        Re: Sci-Fi Story

        Not "Ding Ding!" but "DingDong!" :)

        http://www.theinquirer.net/inquirer/news/2478323/amazon-echo-and-google-home-get-take-on-from-bejing-ling-long-ding-dong

    2. ToXik-yogHurt

      Re: Sci-Fi Story

      More or less the plot of 'Battleship'. Aliens arrive by accident, want to go home, borrow one of our telescopes, shoot back when we shoot at them.

      At least, it's more entertaining to watch 'Battleship' if you pretend this is what the plot is supposed to be...

  10. anonymous boring coward Silver badge

    "helped to crash" looks like something I would write in Swenglish.

    Isn't the correct wording "helped crash"?

    Genuine question!

    1. Jonathan Richards 1
      Headmaster

      Grammar Q.

      Genuine answer from a native English speaker:

      I think the very correct wording is indeed "... helped to crash ...". In spoken and vernacular (British) English one might very well leave out the 'to', thus "I just helped Dad wash the car" is perfectly understandable, but Pedantic Grammar Nazi [icon] would have you say "I just helped Dad to wash the car".

      Edit: added closing quotes. Muphry's Law strikes again!

      1. anonymous boring coward Silver badge

        Re: Grammar Q.

        OK, thanks!

        I won't correct my son if he says that now!

        1. Anonymous Coward
          Anonymous Coward

          Re: Grammar Q.

          Helped [to]

          The ability of 'help' to imply a missing 'to' is what causes the ambiguity when you help your Uncle Jack off a horse..

          1. anonymous boring coward Silver badge

            Re: Grammar Q.

            Can't figure out whre the "to" would have gone..

            Interesting that the "to" has been dropped (in most cases) due to being implied.

            Dropping it in Swedish would make me shudder at the resulting sentence, yet it seems normal in English.

      2. energystar
        Angel

        Re: Grammar Q.

        Thanks a lot, Muphry.

  11. MNGrrrl
    Devil

    Easily said, not easily done

    People say "Just sanity check the inputs!" ... but of course, they fall silent when asked what the corrective action should be in software. Detecting bad input doesn't mean anything if you don't have a way to correct for it. This is not just about sanity checking the input data -- it's also about poor simulations that did not see how the software would handle excursions from the expected flight profile. Of course a chute opening is going to torque the vehicle... and if they'd properly setup the simulation, they'd have discovered that in a stupidly thin atmosphere like Mars, the rotational rate can be very high, and that their IMU could saturate, and once they saw that, they could have setup logic to handle saturation.

    -

    They didn't do any of that. Don't blame the programmers -- it wasn't their job to "sanity check" the inputs... if they were told the IMU would only report data within range X, and it left range X, what are they supposed to do? Write "Can't happen" in the comments and then add magical pixie dust code that correctly guesses a resolution?

    1. Gene Cash Silver badge

      Re: Easily said, not easily done

      No, the simplest thing to do with an insane reading is to ignore it until it becomes sane again.

      I've seen all kinds of sensors (RPM, speed, acceleration, temperature, etc) temporarily report either full-scale or zero for a couple samples. Hell, I've had open/close sensors report both states at the same time. I've also seen the exact same sensors never do that in testing, no matter how hard you mistreated it.

      And *YES* it *IS* their job to sanity check the results.

      1. energystar
        Holmes

        Re: Easily said, not easily done

        Maybe 'Reality Check' Meta Processes Needed Up There. How is that hight goes from 2Km To -0.1Km In a Fraction of s? Hum...

        Young Coding? Lack of Continuity on Legacy Heritage?

        How Could a young coder, knowing almost nothing of Physics On-The-Field experiences, know that a sensor could behave -at rare, random moments- in such a way?

        1. energystar
          Holmes

          Re: Easily said, not easily done

          If Not trusting this particular reading. How Could I Estimate Hight? Hum...

          1. energystar

            From Apollo11 Code...

            From Apollo 11 Code:

            # SUBROUTINE TO CHECK....

            .....

            # TEMPORARY, I HOPE HOPE

            .....

            # SILLY THING AROUND

            .....

            # SEE IS HE'S LYING

            .....

            # OFF TO SEE THE WIZARD

            Congratulate the Girls On This :)

      2. MNGrrrl

        Re: Easily said, not easily done

        > No, the simplest thing to do with an insane reading is to ignore it until it becomes sane again.

        I don't think you're grasping my point here: You're expecting programmers to be physicists. They aren't. They don't know what physical forces will be in play, what the flight model is, what the possible dynamic forces at work are... they only know code. They have to be told exactly what's going to happen, in what order, and what to do if what is expected *doesn't* occur (read: Failure modes). If they weren't told that the IMU could go off-scale high or low -- there's no reason to expect them to code for that possibility, and even if they did... where would that logic branch terminate?

        -

        It's easy to say "Oh just wait until it becomes sane again". Okay, fine. Wait for how long? What if the sensor is broken, and will never again return a sane reading? What if the readings are within the expected range, but are behaving in an otherwise anomalous fashion (ie, incrementing upwards during a descent)?

        -

        You can't just say "Sanity check!" and hand wave the problem away, as everyone in this thread has done. This isn't like normal computer science -- these guys weren't building a data interface for a web page, where they had professional knowledge of what "sane" ought to be. These guys are ordinary programmers, not rocket scientists. You cannot expect them to be the all-knowing gods of information systems. If they weren't told (and it's very clear they weren't) that the sensor could exhibit this kind of behavior, then we cannot blame them for designing a processing system that would not discard its input when said behavior presented!

        -

        As I said before: This was a problem with the simulations. This was a problem with systems integration. It was not a programming flaw: The computer didn't crash with an unrecoverable error during descent. It didn't hang. It didn't branch to the wrong code segment because someone typed '1' instead of 'l'. It wasn't a case of a dereferenced pointer to invalid data that caused the failure. It performed exactly as designed.

        -

        This is hardly a unique problem to the ESA: Planes have fallen out of the sky before due to blocked pitot tubes I don't even know how many times, because the PILOTS (not computers, but the intelligent meat bags you're trusting with your life) didn't realize that the airspeed, altitude, or other standard indicators, might lie to them. These are people that know everything there is to know about how an airplane flies and its various dynamic modes of flight... and even they, with all the knowledge and training, could succumb to a simple sensor error.

        -

        Shall we yell at the pilots for not "sanity checking" all of their sensor inputs too? Even pilots, if they aren't trained to be aware of what these sorts of failure modes look like, will often fail. What hope does a computer have of recovering from something so unexpected? This was NOT a programming mistake. The end, and I don't care how many of you downvote me, you're still wrong.

        1. John H Woods Silver badge

          Re: Easily said, not easily done

          "This was NOT a programming mistake. The end"

          It might be a programming mistake; it depends what the design said. I think there's probably a good case to consider there to also have been a testing mistake (of omission, if nothing else).

        2. anonymous boring coward Silver badge

          Re: Easily said, not easily done

          " If they weren't told that the IMU could go off-scale high or low -- there's no reason to expect them to code for that possibility"

          That's absurd.

          If the sensor CAN generate a value, no matter how improbable, it must be accounted for somehow! Not effing crash the program/system!

          Otherwise you are just building a house of cards.

    2. anonymous boring coward Silver badge

      Re: Easily said, not easily done

      "Don't blame the programmers -- it wasn't their job to "sanity check" the inputs... IF they were told the IMU would only report data within range X"

      That's a pretty big IF!

      But if that IF is correct, then, obviously it's the fault of whoever designed the IMU.

      Still, the whole system would be much more robust if each module did additional sanity checks, not trusting other modules to work according to spec at all times.

      Embarrising all round.

      This is billions of taxpayer's Euros/Pounds/Krona.

  12. Anonymous Coward
    Anonymous Coward

    Pathetitudinous

    Even shittier than the programmers' failure to learn from past mistakes was the scene at the press conference the day after the crash, when the ridiculous ESA bureaucrats patheticly attempted to persuade us that the mission was mostly successful because the lander almost reached the ground.

    1. Schultz

      "the mission was mostly successful because the lander almost reached the ground"

      Indeed, it overachieved and landed 3.7 km early. Better early than late ...

      1. Anonymous Coward
        Anonymous Coward

        Re: "the mission was mostly successful because the lander almost reached the ground"

        In some ways it was successful as from what I gather the lander was to some extent a matter of "we're sending a probe to Mars this year so why not send a lander with it to check that our landing system works for the rover we're sending in 2 years time" .... on that basis it was testing the landing system that's vital for the rover and foumd a potential cause of problems, Of course, if you are then hoping that your lander test will actually land properly then you'll say "we're going to land something on Mars so we might as well put some science on it".

    2. CCCP

      Re: Pathetitudinous

      Look on the bright side. Post Brexit, the UK (minus Scotland maybe) can pull out of the ESA and crash craft into Mars all on its own. It has form.

    3. julianh72

      Re: Pathetitudinous

      "the mission was mostly successful because the lander almost reached the ground"

      A bit like the poor sod who fell from the top of a 100 metre tall radio mast - he was 99% successful in surviving the fall.

      1. Nick Ryan Silver badge

        Re: Pathetitudinous

        A bit like the poor sod who fell from the top of a 100 metre tall radio mast - he was 99% successful in surviving the fall.

        From that height it's pretty much certain that the fall would have caused him no harm, certainly nothing worse than soiled underwear and aching vocal chords but nothing serious. It's the sudden stop that brings the fall to an end that's usually the problem.

    4. anonymous boring coward Silver badge

      Re: Pathetitudinous

      "the lander almost reached the ground"

      Didn't it definately reach the ground?

      I assume it didn't miss Mars altogether by some unfathomable feat?

  13. Big Ed

    A Lot Like Interrupt Overload

    Reminds me of a story about the deployment of an automated luggage handling system at Denver International Airport in the mid-90's. The airport opened to lots of great fanfare and hopes that the automated luggage system could do away with a large contingent of human luggage cart drivers. Only problem was that the robot luggage carts wouldn't stop, missed turns, and even ran into a few planes.

    Programmers couldn't figure out how to fix, so airport management and United Airlines had to rip the system out.

    Experts were brought in to do a post mortem analysis and reported that the programmers never did proper full-scale testing. And, oh yeah, they made a sh*tty OS choice... seems DOS 3.2 was not a realtime OS and was unable to handle the onslaught of interrupts, which led to all the robot luggage carts crashing into airplanes.

  14. lglethal Silver badge
    Trollface

    Bloody software...

    It's always the Software thats the Problem! The Hardware always works perfectly! Dont they train you Software Engineers properly?

    (from a mechanical engineer... :P)

    1. Aging Hippy

      Re: Bloody software...

      The hardware doesn't always work perfectly even if it's designed properly. It's up to the software to handle what's thrown at it in a predictable way.

      Motto for software engineers: If it can go wrong it will go wrong. If it can't go wrong it will go wrong on the first live run (some projects only have one live run !!).

      No they don't train software engineers properly, very few universities have courses that cover the software / hardware boundary. Around 1990 there was a move by the UK MoD towards getting all engineering designs signed off by a Chartered Engineer. (At the time a CEng involved getting a decent first degree, 7 years of practical and formal training, an existing CEng to back you with their reputation. Then you could start the application process). The intention was to spread this to other government departments. Of course it was easier just to employ anyone who can work out which way up the keyboard goes - they don't ask awkward questions that could delay the project, or delay the parachute release.

      We're still at the snake-oil stage of software development where anyone can walk in as an expert. Let's make it a proper profession and get some respect even if we would be liked as much as lawyers.

      Comments suggesting more training always get downvotes. To ease anybody's conscience I will point out that I have a Computing Science degree, am a CEng, and worst of all have an MBA.

      1. anonymous boring coward Silver badge

        Re: Bloody software...

        "..and worst of all have an MBA."

        You are not required to disclose that information! ;-)

        Well done though, getting throught that one. Not the most exciting I can imagine.

      2. Tom 38
        Joke

        Re: Bloody software...

        worst of all [I] have an MBA

        Master of Business Administration or MacBook Air?

        Admittedly, both are pretty shameful

  15. Roger Greenwood

    Margaret Hamilton faced this kind of problem . .

    . . many moons ago.

    Fascinating story here https://www.wired.com/2015/10/margaret-hamilton-nasa-apollo/

    1. Anonymous Coward
      Anonymous Coward

      Re: Margaret Hamilton faced this kind of problem . .

      Thanks for the link to the intereting Margaret Hamilton story!

      "Hamilton wanted to add code to prevent the crash. That idea was overruled by NASA"

      That's why you should never ask your magament if you should add some important piece of code. Either there was no point in asking since they ok'ed what you were going to do anyway, or they will give the wrong answer, as they did here.

  16. Tom_

    Another partial success

    Hey, if you're going to get your altitude wrong and run your touch down sequence at the wrong time it's better to do it early than late. At least that way you get to upload your error report before you're smashed into the ground.

  17. James 36

    Optional

    the landing bit was just a test wasn't it ?

    so even if it failed it was a success as they will have learnt something. for example one of the lessons learnt from beagle was to have data transmitted back during the descent so if anything went wrong then the team would have something to analyse, like what name it gave that noise or the large thing coming up to meet it and if it pondered whether the thing was friendly or not

  18. Comedy of Errors

    Hope they remember the heat shield

    If the Doppler radar measures distance to the ground and they kick off a big heat shield I hope they don't get a reading reflected off it or the same thing will happen again next time.

  19. adam 40 Silver badge

    How many inches

    ... did it think it was below ground level?

  20. larryk78

    Fail fast

    Maybe the software was smarter than we thought and the bug was that the negative altitude reading overwrote the original target altitude of zero.

  21. Highroads

    Full disclosure required

    I am sad that Schiaparelli crashed as it seemed like an interesting mission. ESA needs to get to the bottom of this quickly as the next ExoMars mission is apparently using the same landing concept and software. The report on Ariane 5 V501 loss was good and thorough.

    http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html

    They should do the same thing here. Publish the details and let's learn from what went wrong.

    Recommendation #2 from the Ariane report seems a good one:

    R2 Prepare a test facility including as much real equipment as technically feasible, inject realistic input data, and perform complete, closed-loop, system testing. Complete simulations must take place before any mission. A high test coverage has to be obtained.

    I know David Parker is a very competent engineer am sure he will drive this in the right direction.

  22. jonfr

    Why does this happen?

    Why does this type of fundamental error happen? They can get quite good RISC space able CPU for this type of thing that doesn't choke on high inflow of information. I also wonder what type of software error makes the sensor read it altitude in a such way that its a negative value.

    I guess the software was of bad quality since this happened. I don't think ESA is going to admit that tough. I suspect they have that type of a problem and that is going to be a bigger problem in the future, along with lack of experience in space matters in general.

    1. Destroy All Monsters Silver badge

      Re: Why does this happen?

      Yep, better hire jonfr, he gonna fix da problems and bring lot of experience in space matters.

      1. jonfr

        Re: Why does this happen?

        I do have standards. None of the current space endeavours set out by the human race meets those standards.

        While I don't any experience in dealing with space (mostly because I haven't been there yet). I do have good experience in hunting down software and hardware issues in computers.

        1. Destroy All Monsters Silver badge

          Re: Why does this happen?

          We need more context to eval that affirmation.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like