back to article Japan's Hitomi space 'scope bricked, declared lost after software bug

The Japan Aerospace Exploration Agency (JAXA) has declared the ASTRO-H space telescope, renamed Hitomi after its launch, has been lost in space after a series of errors. JAXA lost contact with the $286m x-ray 'scope last month. On Thursday the agency admitted the instrument is dead, saying that it appears the solar panels had …

  1. Marketing Hack Silver badge
    Unhappy

    RIP Hotomi! We hardly knew ye.

    Well, that sucks. Hopefully this will be a learning experience for improving future satellite attitude control procedures.

    1. Tom 7 Silver badge

      Re: RIP Hotomi! We hardly knew ye.

      I think perhaps the accountant attitude control procedures should be improved too.

      I'm assuming the code wasn't tested for financial reasons and not 'this'll be good for a giggle' reasons.

      1. Anonymous Coward
        Anonymous Coward

        Re: RIP Hotomi! We hardly knew ye.

        I assume that the lacking of testing was due to the fact that "The out-of-control wheels caused the telescope to tumble" and they really didn't have time to do a lot of testing.

      2. Yag
        Facepalm

        Re: RIP Hotomi! We hardly knew ye.

        Don't forget the usual...

        "As we took far longer than planned for the design and coding phase, as the current software is infested with bugs, and as we cannot change the delivery date, the validation team will have to work 3/8 shifts, probably during the saturdays and sundays as well, for the next 3 monthes."

        And it's not even an exageration, but the transcription of a job interview I made a couple of monthes ago.

        For a major satellite manufacturer.

        1. Sam Liddicott

          Re: RIP Hotomi! We hardly knew ye.

          What are 3/8 shifts?

          Just less than half a shift?

          Three eight hour shifts?

          Three days on, five days off?

  2. Gene Cash Silver badge

    Er no.

    The IMU was confused about the body rate, at a time when the star trackers were offline.

    Since it wasn't corrected about the false 22 deg/hr roll, it spun up the reaction wheels, so it really started spinning at 22deg/hr in the opposite direction.

    It switched to safe mode, started trying to acquire the sun, then started firing thrusters. However, the center-of-mass calc hadn't compensated for the optical bench extending, so it all went to shit after that. It spun fast enough that the optical bench & solar arrays flung off.

    Nature "simplified" the story enough that it's wrong. Are they going to have spinning tape reels for their computers next?

    http://spaceflight101.com/h-iia-astro-h/hitomi-failure-chain/

    1. TechnicalBen Silver badge
      Boffin

      Re: Er no.

      I never cease to be amazed at how things like KSP can really show the problems and simulate the real challenges that the space agencies follow.

      Though the game can just re-adjust and recalculate things like COM on the fly, in real life communication limits and the lack of a "save game" feature can make small mistakes critical errors.

    2. Martin Summers Silver badge

      Re: Er no.

      Yeah what you said...

      I love it that I'm sharing reading this site with el Reg readers of massive intellect. I for one only got the gist of what you said.

  3. ma1010 Silver badge
    Coat

    They just didn't listen

    when I told them not to use "kill -9".

  4. energystar
    Linux

    They'll be back at Space.

    Condolences to the The Japan Aerospace Exploration Agency, and to all the Global Astronomic Community.

    "It appears that the telescope was lost after a series of hardware and software errors. " Total confidence on Japan Engineering.

  5. SteveCarr
    FAIL

    Someone's not getting a bonus this year....

    ....if they haven't already done the seppuku thing, that is!

    1. ecofeco Silver badge

      Re: Someone's not getting a bonus this year....

      There will definitely be resignations and much gomen.

    2. Voland's right hand Silver badge

      Re: Someone's not getting a bonus this year....

      The telescope did the sepuku that 's for sure. Dunno about anyone else.

      In any case, nowdays, the japs move your desk next to the window or the door instead of handing you a wakizashi and calling in the assistant with a katana.

      1. Anonymous Coward
        Anonymous Coward

        Re: Someone's not getting a bonus this year....

        Is having a desk next to the window considered a bad thing in Japan? In my (UK) office they are sought-after. It's the desks next to the walkways, doors, waste-paper bins and people with chronic explosive sneezing syndrome that are avoided.

  6. JeffyPoooh Silver badge
    Pint

    'Paranoid Programming Practices'

    I don't know what JPL and NASA (perhaps others) call their software coding style, but there have been many examples of their paranoid approach, where the programmers don't even trust themselves, saving missions. The usual 'Safe Mode' is just the tip of the iceberg; there seems to be many layers of paranoia built into their typical code, based on what I've read about their occasional mission emergencies (and how the software often helps in surprising ways). The same approach extends into the hardware architecture and design of course, but that's not as impressive as the software coders own rigorous hubris-elimination practices (opposite of *a* stereotype).

    Somebody ought to formalize this process in a text book, 'Paranoid Programming Practices'.

    1. ARaybould

      Re: 'Paranoid Programming Practices'

      There's 'Safeware: System Safety and Computers' by Nancy Leveson.

      1. Destroy All Monsters Silver badge

        Re: 'Paranoid Programming Practices'

        That's a classic. Therac 25 will meet you now.

        1. Destroy All Monsters Silver badge

          Re: 'Paranoid Programming Practices'

          There is also Better Embedded System Software by Philip Koopman. See also Philip Koopman's Home Page

        2. allthecoolshortnamesweretaken Silver badge

          Re: 'Paranoid Programming Practices' / Therac 25

          Had to look that one up. Scary. (Thought at first it was a reference to something fictional like Alpha60.)

  7. Brian Miller Silver badge

    Design, test the design, code, then test the code!

    There are just some things that really must be plotted fully out, and all of those little branches must be run down! Stuff like this is why writing software is called a "trade" and not an engineering discipline. (Yes, there was a court ruling about that.) I don't care what degree you have, web pages are not like writing this sort of code!

    You want paranoid? It starts with using methodologies with names on them. Then applying those methodologies rigorously! There is no such thing as, "Oh, we don't have time for that." That's just a bunch of lazy bull****, and total failure means total loss.

    Immediately after the re-orientation, Hitomi’s Inertial Reference Unit (IRU) observed a non-existent roll rate around the spacecraft’s Z-axis.

    Then the system just goes with the known faulty data, instead of a reliable fall-back method. This should have been seen in the initial design, before coding took place. Hello, methodologies are the blueprint from which good code is derived!

    What's really awful is that "live and learn" just doesn't apply when the practitioners never apply hard-won lessons.

  8. Ian Emery Silver badge
    Paris Hilton

    Latest theory.

    The rumour mill suggests it was a forced upgrade to Windows 10 while in mid manoeuvre wot killed it.

    Paris, she can tumble my gyroscopes any time.

    1. Anonymous Coward
      Anonymous Coward

      Re: Latest theory.

      "DING! DING!"

      "?"

      "You seem to be trying to correct crazy gyros using thrusters.Would you like me to help you?"

  9. Version 1.0 Silver badge

    Son of Mars Climate Orbiter?

    I see this too often - people write code that does logical things but never stop to check that the inputs to the code are sensible or that the outputs are within reasonable ranges. In carpentry I learned to "measure twice - then cut" ... programmers write code in the spirit of "measure once then cut, cut, cut, cut".

    In this case, it's just a telescope - not an Airbus. I guess we got lucky.

    1. EveryTime Silver badge

      Re: Son of Mars Climate Orbiter?

      " people write code that does logical things but never stop to check that the inputs to the code are sensible or that the outputs are within reasonable ranges."

      This appears to be the case where the inputs to a function were reasonable and the outputs were within a reasonable range. The issue was that there was a disagreement between estimated state and one sensor-reported value.

      I initially wrote "measured value" instead of "sensor-reported value". But I realized that would be misleading. The terms used indicate your perspective. When you say "measured value" you imply that the measurement is basically correct. When you say "sensor-reported value" you start considering the sensor's inaccuracies, limitations and potential to be just plain broken.

      In this case the software calculated that one of the sensors had an inaccurate output, but then went ahead and used that wrong-but-plausible value in subsequent calculations anyway. Which wouldn't have been fatal, except that the recovery code used the wrong center-of-mass estimate (which seems to have been a ground controller mistake, albeit one that could have been checked for had the code been written or verified with a more paranoid perspective).

    2. Anonymous Coward
      Anonymous Coward

      Re: Son of Mars Climate Orbiter?

      "Yes but the user will never do that. Not an issue...."

      How many times have I heard that?

  10. TJ1

    Good to see DevOps in Space!

    Now we know why El Reg has been pushing DevOps so hard... they reckon it's rocket science!

  11. Alan Brown Silver badge

    Japanese programming culture

    The issue isn't paranoia, it's the cultural reluctance of juniors to challenge senior staff even when it's obvious they're doing something wrong. (aka "Don't question the boss")

    The same problem has caused several airliner crashes.

    As Brian Miller points out, when it comes to the JSA lessons simply aren't being learned.

    The rather infamous japanese resistance to asking for outside assistance comes into play too. The same resistance is what turned Fukushima from something recoverable into a complete clusterfuck.

    1. Anonymous Coward
      Anonymous Coward

      Re: Japanese programming culture

      Well, I contend it's both.

      A friend managed a project with a Japense company, which supplied a subsystem. As long their elder managers were present in the joint status meetings, there would be no problem reports from their engineers and all suggestions were followed. The real engineering discussions took place after the managers left. But their s/w and h/w never checked inputs as that would be implying that my friend's team sent incorrect output.

    2. FrankAlphaXII Silver badge

      Re: Japanese programming culture

      How was Fukushima recoverable?

      Preventable, yes, but it was far from recoverable after the generators were destroyed and the coolant stopped flowing, even after they reconnected the coolant systems to the external power grid to get the pumps moving it was too late. About the only thing they could have done was connect them faster, that may or may not have stopped the meltdowns, but as noone's entirely sure about just how much damage there really is to the cores noone's sure about just how fast they would have had to have moved.

      Now if what you mean is that it was preventable, you are absolutely correct. They could have mitigated by not building reactors on a coastline prone to earthquakes and by making damned sure electrical power would always flow to the plant's pumps no matter what, but aside from that I don't think it was a recoverable situation, unless they had ripped the vessel heads off the reactors to pull the fuel manually (a death sentence for anyone involved after about 3 to 5 minutes inside the containment building at reactor 3) which isn't realistic at all.

      1. Version 1.0 Silver badge

        Re: How was Fukushima recoverable?

        To be fair to the designers, at the time that the Fukushima reactors were built, an earthquake of that magnitude was believed to be impossible on that section of coast. In the same way, the cities in the American North West (Seattle, Portland etc) were built before anyone had any idea that as similar mega-thrust rupture will occur at some point in the future in the Cascadia subduction zone.

      2. Alan Brown Silver badge

        Re: Japanese programming culture

        "About the only thing they could have done was connect them faster"

        Which is more-or-less what was on offer from USAF Okinawa in the form of mobile generators to restore the power. They were standing by for a request for help that never came.

        Even without that the explosions needn't have happened if procedures been followed and the hydrogen vented promptly. Yes it was a bit radioactive but the half life was measured in minutes and hydrogen vented to atmosphere goes more-or-less straight up. If that had been done you'd have a Three mile island scenario (contained meltdown and no caesium leaked)

        And yes it was utterly preventable.

        1: Tepco had been warned about positioning of the generators by US experts but chose to put them there anyway. Other warnings were ignored too.

        2: Tepco and the japanese regulators had too cozy a relationship, leading to insufficient safety paranoia

        3: If the chief engineer onsite had told Tepco manglement to go fuck themselves earlier than he did then things could have been turned around. Chalk it up to the japanese reluctance to question authority I mentioned earlier. He did manage to break through that reluctance, which is why it's not worse than it was, but it could have been a lot less bad. Other reactors of similar vintage up and down the coastline took the wave and survived fine.

        As for positioning: Water-moderated/cooled reactors need access to large bodies of water as a heatsink. If you use coastlines then you're vulnerable and if you use rivers you'll find there's a faultline not far away (especially if there's a salt lick in the vicinity too - which pretty much describes the location of every major settlement established before the 18th century).

        Needing this kind of heatsink is due to the low operating temperature of a water-moderated/cooled reactor. It's the only way to get worthwhile thermal efficiency in the generation turbines.

        The various issues can be solved by not using water in the radioactive side or for primary cooling (ie: MSRs, preferably LFTRs)

      3. cray74 Silver badge

        Re: Japanese programming culture

        How was Fukushima recoverable?

        Well, as you said, it wasn't really recoverable without changing past decisions. However, it was...reducible? manageable?...to a 3 Mile Island level of event rather than popping reactors like firecrackers.

        Specifically, the power plant operators could have vented the accumulating steam, hydrogen, and oxygen that caused the internal explosions. However, TEPCO delayed sending workers into the buildings to open valves because of the radiation risk. (Remotely controlled electrical valves weren't working for lack of electricity.) Venting the reactors would've released radiation, a sensitive issue in Japan, but not as much as the subsequent explosions.

        1. Marketing Hack Silver badge
          Mushroom

          Re: Japanese programming culture

          Fukushima was a regrettable case of the traditional Japanese aversion to asking for help or questioning higher-ups. The plant management didn't alert/contradict TEPCO corporate until it was too later, and then TEPCO corporate didn't notify the government what was really going on until it was really too late, and then the Japanese government didn't ask for help internationally until it was totally too late.

    3. RJChurchill

      Re: Japanese programming culture

      Testing your code never PROVES it is correct,, only that the tests didn't find the bug. Years ago I worked at a small software shop and asked the owner how many lines of code he would use for a CalculateArea formula. He replied 1 line of code. "Return (Length*Width). I replied that my typical routine would be at least 20 linens of code that required 100s of additional lines of code to create the UI to input required operating parameters and the database schema to save the values and the code to implement the values. You need to know at what granularity to log passed-in parameters. Depending the granularity of file logging for debugging I'm writing out the length and width values, the calculated return value and the fact I entered and existed the routine "CalculateArea ". I'm checking for negative and extreme values and emailing out alerts depending on settings.

      That is the problem with working for somebody who writes a few lines of code in MS access and thinks they are a software engineer. No sir, just because you can write a few sentences does not a Hemingway make!

      1. Pascal Monett Silver badge

        Re: Testing your code never PROVES it is correct

        Yup, but NOT testing it just lets Production find the bugs.

        And, in this case, Production found that the solar panel strut connections had a bug - but it's too late to fix that now.

        Goes to show that DevOps does not work in space.

  12. Leeroy Bronze badge
    Joke

    DIO

    I wouldn't use DOA as it arrived in orbit as planned. If it had not you could try a warranty return / insurance from the launch provider maybe. .. to get a new one.

    Dead In Orbit would probably be better.

    1. energystar
      Facepalm

      Re: DIO

      All of the emergency was hard managed with a 'hands off' attitude toward all outsiders, high on consequences, up to this day.

      In missing a firm assessment of what really happened, and also the necessary lessons from the disaster, this lack of accountability led eventually to the pariah-ization of the nuclear industry.

  13. Anonymous Coward
    Anonymous Coward

    Well,

    that was a bit of a cluster fuck..

  14. Hstubbe
    Headmaster

    "...so rocket thrusters were bought online to stabilize the telescope."

    That's cheap rocket thrusters from ebay for you....

  15. x 7 Silver badge

    " Unfortunately, someone at ground control had programmed the probe with the wrong commands, which hadn't been tested"

    was that "someone" a Nork spy?

  16. Crazy Operations Guy

    If only

    If only there were some kind of space craft with a large servicing bay, grabber arms and could carry 5 or more well-trained astronauts. Maybe a craft that has already been proven to be able to perform complex repairs on a massive telescope...

    In all seriousness, why did we have to abandon the Space shuttle? Wouldn't have been cheaper to just build a fresh one every 5 years? Build it so that it would share most parts with the previous iteration but also integrate improvements in propulsion technologies, automated systems, or just smoothing rough edges found in previous models. A development and build cycle like that would've been far cheaper than to keep the old dinosaurs around and much, much cheaper than scrapping it, relying on the Russians while working a replacement that is no better than the old Saturn V / Apollo stack...

    Imagine how much cheaper it would be to launch a shuttle that was equipped with re-usable boosters, modern composites, and optimized aerodynamic profile, advanced computers, and cutting-edge engines. Sure as hell would be cheaper than paying the Russians, sinking billions in the Antares stack, or even paying private companies.

    1. Alan Brown Silver badge

      Re: If only

      "In all seriousness, why did we have to abandon the Space shuttle?"

      Because it was designed as a pickup truck to build space stations and ended up being used as a granny car to go to church on sundays. It was only the last few years of operation that it was actually doing what was intended. For the vast majority of its lifespan it was a solution in search of a problem.

      Satellite servicing missions were never economic. They did do a few but it cost more than simply bolting the flight spare on top of a new stack - and arguably it scared the bejesus out of the USSR when the USA demonstrated it could rendezvous with a LEO bird and bring it back down, so furthering that capability could well have resulted in itchy trigger fingers.

  17. jake Silver badge

    I am not a superstitious man ...

    ... however, I've noticed that renaming any boat/ship always leads to disaster. Probably because the renaming leads to a mission profile that is contrary to the build/launch profile.

  18. Alan Edwards

    A new record?

    A $286m orbiting telescope - that's got to be near the top of the list for things killed by a software bug. I'm guessing that doesn't include the cost of getting it into space either.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019