back to article REVEALED: Titsup flight plan mainframe borks UK air traffic control

London's airspace was effectively shut down on Friday afternoon after a flight data server fell over, the National Air Traffic Services (NATS) has confirmed to The Register after multiple sources gave us specific details of the cockup. Hundreds of flights were cancelled or diverted after NATS was forced to restrict the …

Page:

  1. John Smith 19 Gold badge
    Unhappy

    There's nothing like state of the art hardware

    And this is nothing like state of the art hardware.

    That said they seem to have handled it well enough in the circumstances.

    I hope they will find a root cause to this and figure out what's up with their networking.

    1. Daggerchild Silver badge

      Re: There's nothing like state of the art hardware

      "And this is nothing like state of the art hardware"

      GOOD! Because I like my bedrock tried and tested, ta.

    2. Gordon 10

      Re: There's nothing like state of the art hardware

      @JS19

      OS/370 and its descendants were running mission critical workdwide apps when you were in nappies boy. Give me known failure modes over unknown any day of the week.

      1. John Smith 19 Gold badge
        Happy

        Re: There's nothing like state of the art hardware

        "OS/370 and its descendants were running mission critical workdwide apps when you were in nappies boy. "

        Could you say with an "Emperor Palpatine" voice?

        For the more humor impaired in the audience I should say I absolutely agree. I wonder if this is the one they got off the FAA in the states, and does it still have valves in it, as their last one is reputedly said to have had.

        Reading the story and the comments 2 things intrigue me.

        1) It looks like it was a "bug" in the data that borked the primary, then it did the secondary, which tried to switch back to the primary. So what kind of data can't be sanity checked before its passed into the system (and of course will checking be added to the code now)?

        2) I did not know a Jovial compiler for IBM mainframes even existed. Historically it's been for deep embedded systems like aircraft flight computers, ECM systems, radars etc.

        1. Anonymous Coward
          Anonymous Coward

          Re: There's nothing like state of the art hardware

          As I'm sure JS19 already appreciates, there's also more than just a recompile/rebuild involved in moving an 'interesting' application from one hardware and network architecture to a different one. Testing the end product once rebuilt can be quite a challenge too.

          Doesn't mean it can't be done, doesn't mean it isn't sometimes a good idea, does mean that 'the usual suspects' may be inappropriate delivery partners, or out of their depth, or both.

        2. John Smith 19 Gold badge
          Coat

          Jovial mainframe compilers.

          Looking a bit further I found IBM did supply a JOVIAL compiler under their "Type III" license.

          IOW It was freeware.

          No guarantee supplied. Use at owners risk.

          Possibly not what you're looking for to crunch the code for you mission (and life) critical ATC app.

          The commercial ones were hosted on it as cross compilers for things like the 1750A and Zilog 8002 (apparently the F16 was a design win for this puppy. Defense con-tractors. Crazy).

          My recollection of JOVIAL was it was common on DEC boxes but as cross compilers to deep embedded kit.

          1. Anonymous Coward
            Anonymous Coward

            Re: Jovial mainframe compilers.

            "My recollection of JOVIAL was it was common on DEC boxes but as cross compilers to deep embedded kit."

            As someone who has been around DEC kit being used for safety related stuff in UK for three decades (either as development host, or target platform (eg European Air Traffic Control), or both) I can safely say that although I encountered several instances of Coral and a handful of RTL/2, I came across writeups of the language but never came across a real Jovial, host or target, and I'm not aware that any of my US colleagues did either. [Iirc, a few of the Coral applications started with 'BEGIN CORAL; BEGIN CODE;' followed by in-line assembler for the remainder of the application]

            Jovial seems to be a bit niche, like its hardware compatriot, MIL1750.

            As for Z8002: my employers chose Z8002 in preference to M68K. I left shortly after that, before the sh*t really started to hit the distribution mechanism. They are still suffering for that mistake, maybe three decades later: every update for the ones still in service seems to risk requiring the code and data to be re-laid-out, as already referred to with PDP11 overlays, except without the support infrastructure provided on a PDP11 OS.

            1. John Smith 19 Gold badge
              Coat

              Re: Jovial mainframe compilers.

              "As someone who has been around DEC kit being used for safety related stuff in UK for three decades (either as development host, or target platform (eg European Air Traffic Control), or both) I can safely say that although I encountered several instances of Coral and a handful of RTL/2, I came across writeups of the language but never came across a real Jovial, host or target, and I'm not aware that any of my US colleagues did either. [Iirc, a few of the Coral applications started with 'BEGIN CORAL; BEGIN CODE;' followed by in-line assembler for the remainder of the application]"

              For a more complete list of JOVIAL apps (and development hosts) can be found here.

              http://progopedia.com/implementation/jovial/

              1. Anonymous Coward
                Anonymous Coward

                Re: Jovial mainframe compilers.

                Thanks for the pointer. Other than the aforementioned NATS software, the list is almost exclusively projects from/for the US military, which would account for why it wasn't real visible over this side.

        3. Byham

          Re: There's nothing like state of the art hardware

          The old IBM 9020D was a 6 machine cluster with 3 compute elements and 3 Input Output Elements. When one of the processors hit an out of range condition or error, it would stop the other processors give them a start point in the program and all the environment variables and all of the processors would run the same program to the same point. If only one of them failed - then the processor took itself off line as a hardware fault. If they all got the error, then it was a software fault. The entire core was dumped (as a box of hex printout!) and the system did a Startover where it dumped all the recent input messages. Controllers receive a message saying STARTOVER - all messages after TIME should be reentered. In a well tested real time system nost errors are caused by a timing fault or a bad input message. By throwing out the messages and restarting from a checkpoint say 7 seconds before the crash both of these problems go away. If Scroggins puts in the bad message again - the same result could occur but this time the Data System Specialist will note that the same input message has preceded the previous startover and would have a one way exchange with Scroggins about his message.

          The idea that you just pass the same broken input to an identical backup machine is bound to fail the systems will cycle in failovers. (been there done that).

      2. Byham

        Re: There's nothing like state of the art hardware

        OS370??

        The software is based on MVT and OS360 - if you look at the software from the current DSS position it is still in 80 column card format.

    3. Fe26Mg12

      Re: There's nothing like state of the art hardware

      The only value to your comment is to highlight how little you know about Mainframe class hardware.

    4. swschrad

      article: two root causes, and one's easily fixed

      the harder one: comms failure.

      the easy one: borked flight plan submissions. answer: spit it back in the pilot's face, like we do in the US. fix your problem, then resubmit. anybody who ever put a card deck in the pigeonhole and got a barfed printout back from JCL with no program on it should understand that. I am advised the US flyboy system says where the problem was when a flight plan is rejected. the stack of NOTAMs tacked on the wall (or that's how it worked back when I found these things out) needs to be read again to avoid hitting the next issue.

  2. fowler

    Is the Register sure this is an IBM S390

    I work with mainframes and run some pretty old applications including one over 30 years old but it all executes on in support Z series boxes running Z/OS. I would be surprised you could actually get parts for an S390 now.

    1. Anonymous Coward
      Anonymous Coward

      Re: Is the Register sure this is an IBM S390

      One of my "prize possessions" just happens to be an IBM token ring card.

      1. harmjschoonhoven

        Re: Is the Register sure this is an IBM S390

        One of my "prize possessions" just happens to be an (1) IBM valve. Picked it up on a scrapyard as a kid, should have taken the bucket full of them.

    2. Voland's right hand Silver badge

      Re: Is the Register sure this is an IBM S390

      I heard the same from a couple of blokes on the plane yesterday - it is 15year old IBM mainframe rebadge by Lockheed.

      I ended up on a plane brought in "manually" half-way from a divert to Charles De Gaulle. Funnily enough, the people guiding it took a considerably more optimal path - the 320 cut in across Croydon and leveled onto approach somewhere over south London instead of taking the usual lumbering scenic route over all of London.

      The pandemonium at LHR was complete - the few planes coming in to land on manual guidance could not unloaded. The few planes being unloaded could not get their luggage off the plane because the luggage transport was full of bags for the planes scheduled to depart. You name it.

      In any case - as most mainframe based systems it looks like it has an over-reliance on the mainframe never failing and no true primary-to-backup fallback. Mainframes fail very very rarely, however once they fail, you pay for the fact that the system was designed without system level resilience. Just like in this case.

      1. Roland6 Silver badge

        Re: Is the Register sure this is an IBM S390

        as most mainframe based systems it looks like it has an over-reliance on the mainframe never failing and no true primary-to-backup fallback.

        My thoughts from the very little that has been published, is that the problem seems to have been not so much in the primary-to-backup fallback, but in that two things seemed to fail (system and network link) and as I've also come across with many business continuity solutions insufficient attention being paid to the restoration (fallback-to-primary) of normal operations.

  3. Puffin

    Hmmmm

    I wonder why they originally announced the problem as a terrorist threat...

    1. Doctor Syntax Silver badge

      Re: Hmmmm

      Standard operating procedure.

    2. Anonymous Coward
      Anonymous Coward

      Re: Hmmmm

      Let me venture a guess - because that allows using any means necessary to deal with complaining passengers.

  4. ScottME

    Properly engineered systems!

    IBM mainframes are boring and predictable. Exactly what you want for safety-critical infrastructure. Who cares how "old" it is - it gets the job done, with uptimes in years. Rather surprised a bad flight plan can cause problems though.

    1. Anonymous Coward
      Anonymous Coward

      Re: Properly engineered systems!

      "Rather surprised a bad flight plan can cause problems though."

      That does smell of totally inadequate software testing somewhere along the line, doesn't it? Which is sadly not unusual, given the software testing is boring, unglamorous and rarely given adequate resources.

      But this is one of NATS primary systems, and NATS is a billion pound a year business, it seems inexcusable that user input can bork it.

      1. Anonymous Coward
        Anonymous Coward

        Re: Properly engineered systems!

        I'm thinking the data issue was because some eco-loon decided the carbon-footprint of each flight needs to be shown as it was bodged onto the data feed

      2. RW
        Boffin

        Re: Properly engineered systems!

        Can that system handle Unicode text? It's old enough that Unicode may be implemented either not at all or only in a rudimentary fashion. Have a pilot submit a flight plan with notes in any language written with characters outside the usual 256-character font set up, and kaboom? And there are a lot of such languages, among them Russian, Polish, Greek, Turkish, Georgian, Chinese, Japanese, Hindi, Thai, and a host of others.

        Hmmmm.

        Just to test El Reg's own system: ΣЩՊਊฒႪおナ两

        1. Anonymous Coward
          Anonymous Coward

          Re: Properly engineered systems!

          "Can that system handle Unicode text?"

          IF THE SYSTEM REALLY DATES BACK TO THE 1960S IT IS ENTIRELY POSSIBLE IT CANNOT DO LOWERCASE LETTERS NEVER MIND THE JOYS OF UNICODE.

          1. Black Betty

            Extended Binary Coded Decimal Interchange Code

            All the way to the D before Google auto-complete offered up EBCDIC.

          2. Mike Pellatt

            Re: Properly engineered systems!

            And show me the IATA (??) airport codes containing Unicode....

    2. tirk
      Facepalm

      Re: Properly engineered systems!

      ...can still fail if implemented badly. I remember being at a client's new datacentre many years ago were they had half a dozen or so S370s. There was a power failure affecting the whole site, but the UPS, followed by the on-site generator, kept the mainframes running. Unfortunately, some genius had wired the mainframes consoles into the normal ring main, rather then the protected circuit, so whilst the 370's continued running, not a lot could be done with them! Doh!

    3. werdsmith Silver badge

      Re: Properly engineered systems!

      Considering CAA document CAP 694 for Flight Planning is 120 pages, I'm amazed there's no validation of the input data.

      It borks the mainframe and causes it to failover to another mainframe which is borking on the same data?

      As usual on The Register there will be someone who can explain why that's OK... but FFS!

      Get rid of the S390s and put in a couple of Raspberry Pis.

      1. Byham

        Re: Properly engineered systems!

        Every input is validated by specific programs for each input type, And any fault not only in syntax but also in logic is returned either to the inputting person or to operators as a 'referred reject' to be sorted out. The system is extremely resilient to input errors.

        The original design of the system when it had a startover was that it dumped all the input messages in the message input queue for the previous minute and told controllers to re-enter them. This stopped the cycling of fail overs that are bound to happen by sending the same broken message to an identical machine with identical software. That was part of the move to Swanwick and rehosting the old NAS host software from the IBM360's into a simple mainframe as a virtual machine (we said that wouldn't work at the time).

        However, I challenge anyone in the commercial world to have the same availability as NATS is getting from the Jovial/BAL software which is in the 99.999% or better range.

  5. Borg.King

    User submissions need pre-check

    We take customer data into our systems, and also have suffered from poor data causing issues in the past. These issues are now mitigated by having submission systems to check and reject any data not conforming to the required format or standards.

    I would have thought it a fairly easy procedure to verify flight plans are good at the point of submission, since this should not be a time or mission critical point in the process.

    1. jong

      Re: User submissions need pre-check

      oh it's in the right format alright.

      It's just they're heading to somewhere different to where they said they would.

      How do you fix that in your submission systems ?

      1. Yet Another Anonymous coward Silver badge

        Re: User submissions need pre-check

        >It's just they're heading to somewhere different to where they said they would.

        >How do you fix that in your submission systems ?

        Lasers (flying sharks optional)

        1. Paul Crawford Silver badge

          Re: User submissions need pre-check

          I want a flying shark, even without the laser it would be a cool thing!

          Oh and while I am dreaming, a castle or island lair so I can have a moat for said flying sharks to frolic.

          1. Stoneshop
            Black Helicopters

            Re: User submissions need pre-check

            I want a flying shark,

            The guys who built Orville the CatCopter are working on one (sorry, I don't have pics available)

          2. Anonymous Coward
            Anonymous Coward

            Re: User submissions need pre-check

            Oh and while I am dreaming, a castle or island lair so I can have a moat for said flying sharks to frolic.

            You don't need a moat if they're flying..

      2. Anonymous Coward
        Anonymous Coward

        Re: User submissions need pre-check

        How do you fix that in your submission systems ?

        In this day and age the fix is often AAA or fighter jet scramble.

      3. John Smith 19 Gold badge
        Facepalm

        Re: User submissions need pre-check

        "It's just they're heading to somewhere different to where they said they would."

        Isn't that an alarm, not a systems crash type event?

      4. Matt Bryant Silver badge
        Facepalm

        Re: Jong Re: User submissions need pre-check

        ".....It's just they're heading to somewhere different to where they said they would.

        How do you fix that in your submission systems ?" It's called a data quality check - you check the submitted data against a historic record of activity and flag and reject anything out-of-scope. An example might be checking the flight number and the associated historic destination with the destination entered, or simply doing a check that the destination is within the safe flight range of the aircraft's fuel load. Such checks are common in financial systems and are used to detect fraud ("why is Mr X's credit card being used to buy a smartphone in Singapore when his purchase record shows he is in New York?").

        IMHO, someone cut some corners on the code (one bad data entry screwed the whole system?!?), but what's even more unacceptable was the cluster bouncing - no-one at IBM heard of non-automatic failback? TBH, re-write it for a distributed cluster and put the lot on a dozen Linux servers, then spend the savings on some real testing.

    2. billse10

      Re: User submissions need pre-check

      Has that Bobby Tables guy been filing flight plans again?

  6. Gordon 10

    Software release issue.

    The airline industry is riddled with legacy apps running on OS/370 and its descendants. When I were a lad you werent a man until you triggered at least a Ctrl-3 core dump on Prod.

    A friend of mine one took out ticketing for all of Italy for 8hrs with a particularly buggy piece of assembler.

    if it is OS/390 I wonder if its an ALCS/TPF relative?

    1. Anonymous Coward
      Anonymous Coward

      There's legacy, and there's legacy

      That well known journal of record for the IT sector, the Daily Telegraph, reports that the software in question was written in Jovial, which if true might explain why long-standing bugs still haven't been corrected - there probably aren't many offshorers with Jovial experience, and people with Jovial experience are so old that they want to be paid a decent rate, either for fixing the code or for training someone to fix the code.

      1. Anonymous Coward
        Anonymous Coward

        Re: There's legacy, and there's legacy

        Did I get downvoted for not including a link to the Telegraph article, or for some other unstated reason?

        Anyway, here's the link, let's see if the El Reg revamp makes plaintext clickable if it's a URL:

        http://www.telegraph.co.uk/news/aviation/11291495/UK-flights-chaos-Air-traffic-control-computers-using-software-from-the-1960s.html [edit: apparently no autoclickability. Someone else's software is seriously outdated :(]

        Extract:

        "A consultant who has worked for Nats said it knew its software needed to be replaced a decade ago but will be relying on the 1960s programmes for another two years.

        Martyn Thomas, Visiting Professor of Software Engineering at the University of Oxford, said: “The National Airspace System that performs flight data processing was originally written for American airspace in the late 1960s.

        “It wasn’t designed to cope with the volume of air traffic we have today, or to interface with modern computer software.”

        Prof Thomas said the NAS system was written using a now defunct computer language called Jovial, meaning Nats has to train programmers in Jovial just to maintain the antiquated software."

        [continues]

        1. Roland6 Silver badge

          Re: There's legacy, and there's legacy

          Prof Thomas said the NAS system was written using a now defunct computer language called Jovial,

          According to wikipedia ( http://en.wikipedia.org/wiki/JOVIAL ), to say JOVIAL is now defunct is overstating things. But I've not kept abreast of recent developments, so does any one know what is now being used instead of JOVIAL? (I'm a little surprised the wikipedia article doesn't mention this, so presume the obvious candidate - ADA, isn't quite so obvious or universally used).

          1. John Smith 19 Gold badge
            Coat

            Re: There's legacy, and there's legacy

            "But I've not kept abreast of recent developments, so does any one know what is now being used instead of JOVIAL? (I'm a little surprised the wikipedia article doesn't mention this, so presume the obvious candidate - ADA, isn't quite so obvious or universally used)."

            JOVIAL was big for real time control apps. IIRC it did the software for the B52, B1 and F15 at least (off the top of my head). The USN (being the USN) had something else (CSL?, something with a C in it)

            I guess the UK equivalent were things like CORAL66 and RTL2 (ICI's in house computer language. No that's not a typo).

            In theory Ada was meant to be the cure for this babel of DoD languages (including most of the assembler). But writing a full Ada compiler is a not trivial exercise and the DoD has a lot of odd hardware knocking about. and getting conversion tools to convert old-bonkers-software-originally-running-on-valve-processors has turned out to be a tad expensive.

            The big surprise (for me) was having a Jovial compiler for an S/390 (or rather an S/360 as it would have been then). AFAIK when it's IBM mainrframe and it's real time it was assembler (which is how NASA got theirs to deal with the Apollo programme).

            Yes that's an anorak.

            1. Anonymous Coward
              Anonymous Coward

              Re: There's legacy, and there's legacy

              "But writing a full Ada compiler is a not trivial exercise "

              These days gcc and gnat mean that in general 'only' the code generating bits need to be target specific, other kind and clever people have done most of the rest in a target-independent fashion, and you can have it (source included) for free.

              You'd still have to be 'a bit special' to want to do your own compiler, but at least one UK aerospace company apparently have done it:

              http://gcc.gnu.org/wiki/cauldron2012?action=AttachFile&do=get&target=petergarbett1958.pdf

              1. Anonymous IV

                Re: There's legacy, and there's legacy

                I suppose most people know that JOVIAL is an acronym for "Jules' Own Version of the International Algorithmic Language", named in the heady days of the 1960s when such whimsy was quite acceptable.

                In these more enlightened tennies, who would ever dream of giving a version of an operating system a ridiculous name such as Flatulent Ferret or Mangy Mongoose? It just couldn't happen, could it...

                1. RW
                  Headmaster

                  Re: There's legacy, and there's legacy

                  A reminder that the "International Algorithmic Language" referred to is Algol, but whether Algol-60 or Algol-59 I do not know.

                  When I worked for Burroughs back in the day, I was once shown the two file drawers containing the punch card source of a Jovial compiler for Burroughs' "large systems". It was never finished. (Burroughs had significant aeronautic expertise.)

                  1. Roland6 Silver badge

                    Re: There's legacy, and there's legacy

                    A reminder that the "International Algorithmic Language" referred to is Algol

                    Algol-58 was effectively a rename of IAL. There is a good piece on the circumstances prevailing in the late 50's that lead Jules Schartz to define JOVIAL in the 1978 article: http://jovial.com/documents/p203-schwartz-jovial.pdf

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like