back to article UK air traffic bods deny they 'skimped' on IT investment after server mega-fail

The chief executive of the National Air Traffic Services, Richard Deakin, has denied the body “skimped" on its IT investment after being hauled in front MPs this week to account for its major computer outage. The cock-up last Friday resulted in 120 flights being cancelled and 500 flights being delayed for 45 minutes, affecting …

  1. Ben Tasker Silver badge

    To be fair, bugs do happen from time to time, and the latest kit isn't automatically the best kit for the job.

    The unanswered question, though, is whether they could perhaps have avoided (or shortened) the outage through investment. I'd be surprised if they didn't have something to fail over to, though it wouldn't be the first time an organisation has decided that redundant == unnecessary cost.

    1. Alister Silver badge

      As I understand it, the actual outage was very short, (in the order of sub ten minutes IIRC) what caused the disruption was the very thorough and comprehensive safety protocols which kicked in after 5 minutes, designed to gain maximum separation for all aircraft in the controlled area.

    2. Graham 24

      Failover not necessarily the answer

      Failover works well for hardware failures. It's not so good for software failures (and this apparently was a software failure), since typically the software on the failover system is the same as on the primary system, so has the same bugs. In a worst-case scenario, it's an older version that has more bugs / expects or produces different data structures / has worse performance (or all of those).

      The theoretically correct way to do software failover is to have multiple teams develop, test and validate the software completely independently, so that when the primary system fails, you switch to another system that is supposed to do the same thing, but correctly this time.

      In practice this approach generally doesn't work - the secondary system is not subject to the same day-to-day running that exposes latent bugs, the costs of doing everything twice upsets the beancounters, preserving the necessary independence between the teams is really difficult, the primary systems failure may have corrupted the very data the failover system needs to work properly... ... the list goes on.

      1. william 10

        Re: Failover not necessarily the answer

        I agree, this is why they should have two independently developed systems running in parallel.

        The slight of hand here is to confuse hardware spend with investment (just like RBS did recently with there new mainframe), Hardware spend should be seen as a cost of doing business. What one needs to look at is the Software investment as this is what's important to the business. I would expect to see a five-ten year rolling replacement plan, due to the nature of the business I would expect this to be developed in house by three competing teams.

        1. Anonymous Coward
          Anonymous Coward

          Re: Failover not necessarily the answer

          I would expect to see a five-ten year rolling replacement plan, due to the nature of the business I would expect this to be developed in house by three competing teams.

          So at about the time you've got a reliable working system, and hopefully the worst of the bugs have been ironed out, you're going to bin it and start again? As for three independent systems/development teams - dream on, nobody's going to pay for that when the alternative is to fall back to a safe-but-limited mode of operation and accept some disruption for a day - which is what they did in this case.

      2. HmmmYes Silver badge

        Re: Failover not necessarily the answer

        Hmm, there are other means of failover. Check out Erlang/OTP.

      3. earl grey Silver badge
        Trollface

        Re: Failover not necessarily the answer

        Oh, thought you said "fallover" and being that's what it did, i was slightly confused. I'll just go have a lie down now.

    3. yoganmahew

      Well, the indications are that the failover failed with the same error too and that the problem was 'unexpected data' related. This is a problem for any system processing external feeds, however well defined the standard is. Depending on the number of data elements in the feed, it is not even possible to brute force all the combinations, so testing tends to be based on what is likely and what has gone wrong before.

      Again by accounts, the initial problem was 'recovered' in 15 minutes, but then a subsequent network problem resulted in a longer duration outage. As it's OS/390, it makes me wonder if it is an SNA/NCP network. These can be a bit of a bear to restart after a timeout following a failure, but the lessons learned will probably stand them in good stead. In the place I work, any number of firewall, DNS, login/credential problems occasionally bedevil IP-networked servers following outages (i.e. all was fine while the server was up and connected, it is only when it tried to reconnect that the configuration loaded three weeks ago is found to be bad), so I don't believe it is a case of simply saying that the network needs to be upgraded.

      Overall, my experience of OS/390 is that it is a rock solid basis for a system. I remember a power outage that took out all the DASD. OS/390 sat there bleeping that it couldn't access disks, but otherwise stayed up and restarted itself once the DASD were IML'd back online. All the other systems we had at the time crashed and burned, some horribly so...

      1. Stoneshop Silver badge
        Go

        Overall, my experience of OS/390 is that it is a rock solid basis for a system. I remember a power outage that took out all the DASD. OS/390 sat there bleeping that it couldn't access disks, but otherwise stayed up and restarted itself once the DASD were IML'd back online.

        I'm reminded of a situation where some cable-laying monkey had removed an entire single row of floor tiles behind one of those washing machine-like disk drives. So, after a vigorous bit of seeking, the legs underneath the drive inevitably started diverging from the vertical, and the drive decided to take a plunge into the chasm.

        After undoing this mishap (I'm unaware if the cable monkey was made part of the solution) and spinning the drive back up, VMS merely reported "Drive improperly dismounted".

  2. Anonymous Coward
    Anonymous Coward

    from reading the story on here last week wasn't part of the problem actually the failover itself?

  3. Pen-y-gors Silver badge

    Overloaded airspace?

    Part of the problem seems to have been the knock-on effects. One aircraft is delayed and so that delays the next and that delays the next and so on. The system is run so close to capacity that there's no breathing space for recovery for when the inevitable problems do occur. Same problems when motorways are busy and one person brakes suddenly - 2 miles back down the road the traffic grinds to a halt.

    Not sure what the answer is - leave more time between landings and departures so that there is some slack to allow for recovery? But of course that would cost money!

    1. Destroy All Monsters Silver badge

      Re: Overloaded airspace?

      But of course that would cost money!

      I will also BRING IN the money, and this why people are looking at various solutions to either engineer the slack out of the system (bad idea) or put more slack into the system (e.g. laxer rules regarding aircraft paths, supported by automation)

    2. SkippyBing Silver badge

      Re: Overloaded airspace?

      "Not sure what the answer is - leave more time between landings and departures so that there is some slack to allow for recovery? "

      More runways. Heathrow is run at something like 97% of capacity because there's effectively only one runway to land on so as soon as there's a delay the knock on effect is almost instantaneous. Most airports working with a similar number of aircraft have 3 or 4 runways and hence have more flex as they have a lot of spare capacity.

  4. Keef

    Know it all politicians get a chance to gob off.

    Perhaps the politicians would like to have a go at running NATS?

    They seem to know so much about everything they must be able to do a better job than that lot currently in charge who have allowed it to go wrong twice in 5 years with no serious side effects.

    1. Cliff

      Re: Know it all politicians get a chance to gob off.

      And credit to the NATS guy saying he can't guarantee there will never be another problem

    2. Pete 2 Silver badge

      Re: Know it all politicians get a chance to gob off.

      > Perhaps the politicians would like to have a go at running NATS?

      TBF, early in its development I was asked if I would like to do some work at Swanwick - NERC (as it was known then). I spent a day with the management team and politely declined. Even then it seemed to be a shambolic mess and was regularly being slated in the computer press.

      Having said that, the basic problem is one of capacity and efficiency. The closer you get to running any system at 100% of its capacity, the less margin you have to deal with unexpected events as there is less "slack" that can be taken up to lessen their impact. It's the same reason that busy motorways jam up due to minor RTAs. If you want a resilient motorway / airspace / factory that can quickly recover from downtime, breakdowns or jams you really shouldn't run it near to it's limit. However, if you do hold back a margin for error then you get accused of "waste". It's a lose-lose situation and the only remarkable thing is that there are so few cockups.

      1. Destroy All Monsters Silver badge

        Re: Know it all politicians get a chance to gob off.

        It is also very difficult before the cockup to be certain whether "no problem up to now" means that your systems are sufficiently good or whether lady luck has just smiled a bit longer than usual. For additional info: The Normalization of Deviance.

        We are in one of those cases regarding the exchange of nuclear weapons. Regarding the history of the start of WWI, illusions as to which case it actually is are not recommended.

  5. Anonymous Coward
    Anonymous Coward

    Oh Dear old Vince bless him. You think the system suffers from fall over now, wait until the project to replace gets up, running and installed. It will be way over budget and frequently crash until 12 months of service has gotten the worst bugs out of the system.

    1. Roland6 Silver badge

      Another way to look at Vince's comments is that he has effectively given NATS a green light to ask the government for money to enable them to make the necessary large-scale investment needed to remove their reliance on ancient systems...

      Naturally being a politician Vince will be able to then come up with some excuse as to why the government is unable to provide funds and that NATS should continue to use it's existing systems...

  6. Shasta McNasty

    MPs should keep their mouths shut about things they know nothing about

    So a system that is driven at, or close to, full capacity every single day had a few minutes delay when an abnormal and previously unheard of scenario occurred?

    This isn't the problem. The problem is that the service the system is supporting does not have any spare capacity which means every delay has a knock on effect in that flights are delayed or need to be cancelled.

    No-one was killed or injured because the system went bat-shit crazy and caused planes to collide, people were just inconvenienced by a relatively minor delay.

    Older systems are typically tried and tested and most faults have been found and resolved/worked around. Newer systems may be faster or have more functionality, but have a whole new legion of bugs yet to be discovered.

    I vote for the slight delay with legacy hardware instead or the shiny new hardware that shits itself because flash player has been updated for the 5000th time.

    1. Primus Secundus Tertius Silver badge

      Re: MPs should keep their mouths shut about things they know nothing about

      @McNasty

      The fact remains that ten thousand people were affected with delays of up to a day, and politicians as our representatives are fully entitled to complain about that.

      Yes, I agree with the comments that running a system at 99% capacity means that one small problem cascades into many bigger ones. There is a rule of thumb that systems should be run at 2/3 to 3/4 of capacity. But as others remark, try explaining that to beancounters or indeed to politicians who set our taxes.

      Computer systems have always demanded high reliability from their components. Originally we needed valves that did not pop every few hours: the MTBF of a system with a thousand valves would be a few seconds.

      Nowadays the problems are with the software: that is, with logic on a large scale. Not many people are very good at large scale logic, and at breaking down a large problem into smaller ones. You certainly don't find that ability by hiring the cheapest contractor.

      1. This post has been deleted by its author

  7. Alan Denman

    a sea of sharks

    Going internet based soon (WHT!), when the clouds down you will be in that sea of sharks with the hackers IM message on your smartphone as you fall, 'Fly me I'm Josephine'.

  8. Crisp Silver badge

    Now it's been officially denied...

    You know it's true.

    1. This post has been deleted by its author

  9. sandman

    "Old" works for me

    As long as I'm on a plane I sort of like the idea of a tried and tested system controlling the airspace I'm in. New is only good if it offers extra functionality that is absolutely necessary. I'm going to suggest that the actual requirements for an air traffic control system don't change much (assuming there is spare capacity within the system). I'm also going to suggest that there seem to have been very few (if any?) fatal accidents caused by air traffic control problems in the UK (I don't know how many buttock-clenching near-misses have happened).

    1. Primus Secundus Tertius Silver badge
      Joke

      Re: "Old" works for me

      There was a young man with a drone

      Who thought that the sky was his own

      Till it flew in the way

      Of a jumbo one day

      And to jail he was rapidly thrown.

    2. king of foo

      Re: "Old" works for me

      If the 20+ year old tech has been 'maintained' and 'refined' over those 20 years then arguably it is not "old" and I wholeheartedly agree with the above. However, if someone wrote the source code in the 90s and it hasn't been updated in 15+ years then someone needs to be beaten to death with their own 56k modem.

  10. Richard Jones 1

    It was said to be a line of code with a previously unfound error

    I am slightly bemused, the actual issue was relatively short lived though the impact was rather bigger. If there is a system in the world that does not have a code error somewhere I will be mighty surprised.

    Any system that runs above about 85% capacity for much of the day has almost no capacity for recovery.

    I have been involved with systems that ran up into the high 90% range, recovery was a horrible process. Later load was split across two systems load sharing and even though the load climbed back well over 80% no outages were ever critical. I understand that NATs runs in the mid to high 90s for too much time.

    Not the same but I remember one system with alternately worked 'sides', the Y switch at the heart of the change over failed and it took hours to get the system back on line. The item had never failed before and was not usually tested - prior to that event!

    It probably never failed again after that event.

    Failures do happen and systems especially critical systems need both processing space and time to come back and stabilise. It is one thing to run a few risky financial transactions through the system and have to unwind them after a full recovery if they fail.

    It does not work the same way with planes that were in the sky!

    1. Peter Gathercole Silver badge

      Re: It was said to be a line of code with a previously unfound error

      This is exactly why they have strict operating procedures that dictate that if they can't get the system back up within a set period of time, the invoke their contingency plans to keep passengers, aircraf and aircrew safe.

      I understand from an interview I heard on BBC Radio 4 on Friday or maybe Monday that this threshold is 7 minutes. The interviewee said that they had the system running again after 15 minutes, but that was 8 minutes too late.

      Once they've initiated the contingency plan, which basically involves preventing any more aircraft from entering the controlled air space and getting as many that were already there on the ground as quickly and safely as possible, the damage was done. It was inevitable that there would be issues that ran on into the following days (aircraft and aircrew being in the wrong place, aircraft missing their scheduled maintenance because they were not at their maintenance location etc.)

    2. Bob Wheeler

      Bad Data Input/High System Utilisation.

      It is my understanding, but it's the airspace that is being used to high 90's utilisation, not the mainframe.

      Something that does not seem to have been picked up on, but they arew saying that the fault was caused by 'bad data' from a flight plan being input into the system. Given that this is for commerical aviation, all the flight plans would have been, I suspect, computer generated and feed into NATS, from the likes of BA, Air France etc., etc. this is not hand typed by the pilots. If that is correct, has one of the airlines that fly into/over UK air space made any changes to their system that generates the flight plans?

      Always ready to be corrected if I've misunderstode things.

      1. Steve Davies 3 Silver badge

        Re: Bad Data Input/High System Utilisation.

        The problem is the quality of the Flight Plan messages.

        Having had to develop software that understands the format of Flight Plan messages

        (Think IATA Type B messages on steroids)

        there are huge potential problems. My system regularly rejects 40% of FPL messages.

        For example, the spec says that there MUST be a SPACE after the Aircraft Reg No. Many FPL's are submitted with this missing. Some messages use the IATA code for an destination when they must use the ICAO code.

        eg Heathrow is IATA LHR, ICAO EGLL.

        This is all despite most FPL's beging generated by software in the first place.

        This is a disaster waiting to happen. It nearly did last week.

        These events do not surprise me at all.

        1. yoganmahew

          Re: Bad Data Input/High System Utilisation.

          "The problem is the quality of the Flight Plan messages."

          Indeed and working in IATA Type B (as I do), most IATA Type B messages fail to conform to standard too! Spaghetti doesn't even describe the code to support it.

          I doubt it is one of the major airlines that sent in a duff message. I think it more likely one of the upstart airlines with their cloudy, badly-written by code-monkeys who don't understand or follow the standards, two-bit systems. Cheap they may be, reliably wrong they are... at least from my experience of the Type B messages they send.

        2. Primus Secundus Tertius Silver badge

          Re: Bad Data Input/High System Utilisation.

          @SD3

          Some organisations use XML for data transfers internally or with suppliers and customers. The XML must be declared valid against a pubished schema, and that can be written into a contract. It should cope with the kind of input errors that you describe.

          No, people don't write XML, software does. Excel can be provided with a template, designed by an XML guru, so you type in to a sensible form that will bellyache if you get it wrong, and then export the rigorous data.

          Or so I understand. I have experimented with XML but not used it for real.

          1. Destroy All Monsters Silver badge

            Re: Bad Data Input/High System Utilisation.

            I have experimented with XML but not used it for real.

            It's not very difficult but the Schema definitions must have been created by a committee of maniacs suffering from second system effect. Use it anway! Or something similar.

            AFAI hear aviation likes semi-formatted telex-style messaging for some reason though. Bring out the statistical recognizers!

            1. SImon Hobson Silver badge

              Re: Bad Data Input/High System Utilisation.

              > aviation likes semi-formatted telex-style messaging for some reason though

              "some reason" is to do with the requirement for global accessibility ! There is also the issue of making things easily comprehensible.

              So messages are text only - while it may be true that few places (if any) in the world now cannot get access to modern comms like "the internet", that certainly wasn't the case when the standards were laid down. Back then, Telex was the norm - which itself imposes some of the restrictions (see below - no lower case !)

              And just like every other industry, you want the information to be easily and quickly understood - so a set of standard abbreviations which are compact and quick to read. Eg, from the weather reports/forecasts we get things like PROB30 instead of "there's a 30% probability" and loads of other things that make it quick and compact. For example, take a current (as I write) forecast :

              TAF EGCC 231700Z 2318/2424 22014KT 9999 SCT020 TEMPO 2318/2322 24016G26KT TEMPO 2318/2402 4000 RADZ BKN012 BECMG 2323/2402 28012KT TEMPO 2323/2402 BKN009 PROB30 TEMPO 2412/2418 29015G25KT=

              Longhand, the weather forecast at Manchester, produced at 5pm Zulu time (or British winter time if you prefer) on 23rd is : validity between 6pm on the 23rd and midnight on the 24th; wind will be from 220˚ at 14 knots, visibility over 10 kilometres, scattered cloud at 2000 feet. Temporarily between 6pm on the 23rd and 10pm on the 23rd expect wind from 240˚ at 16 knots and gusting to 26 knots. Temporarily between 6pm on the 23rd and 2am on the 24th, visibility will be 4000 metres with rain and drizzle, and broken cloud at 1200 feet. Between 11pm on 23rd and 2am on 24th wind will become 12 knots from 280˚. For periods between 11pm on 23rd and 2am on 24th could will be broken at 900 feet. And there's a 30% probability that between mid-day on 24th and 6pm on 24th the wind will be 15 knots from 290˚ with gusts to 25 knots. Ends.

              So the "code" version is around 1/4 of the space of the longhand version - which matters when you have a pages with many reports on it. Also, because it's a standard code, it's much less open to interpretation (and particularly, misinterpretation) than freeform text. If you don't allow freeform text (ie restrict to a standard set of terms) then it makes little difference in terms of what you need to learn whether the code is verbose or terse (as long as the terse code isn't also "opaque") - but it makes a big difference to speed of transmission, the space it takes on paperwork (and screens), and so on - in the past (I know some will find this hard to believe, but there was a time before "ubiquitous" mobile internet !) I've used text-back services to get weather forecasts and the terse code is a lot easier to read on a small mobile screen. The above would overflow onto two texts, but most forecasts would fit within the 160 character limit of a single text.

              To anyone for whom the above information matters, reading it takes seconds and leaves no room for misinterpretation.

              Yes, things could probably change - but this means the whole of the world changing. Getting global agreement (via ICAO) for a change is a very very slooooooooooow process ! No country/region/group of countries can go it alone and change without the others doing so. Sticking with this example, the above forecast would have come from the UK Met Office - but is distributed internationally. The weather at Manchester is not just of interest to people in the north of England - but to operators/pilots of all flights flying into Manchester from anywhere in the world.

          2. The Mole

            Re: Bad Data Input/High System Utilisation.

            Its relatively simple to validate each individual field in isolation, however it quickly gets to be very difficult (and impossible in many schema languages) to validate inter-relationships between fields such as X must be less than Y unless Z is set to A.

            1. Primus Secundus Tertius Silver badge

              Re: Bad Data Input/High System Utilisation.

              @The Mole

              You are right that a complete record must have that validity between fields that goes beyond checking for zero, not letter 'O'. (And date formats need to be enforced, essential fields must not be omitted...) But each field has to be validated first, which an XML schema can do. The next stage needs a program.

              The kind of Excel input I was discussing can do that, of course. So can whatever program is on the receiving end after it has used a library function to capture the XML into a record in C, COBOL, or whatever.

      2. Proud Father
        Facepalm

        Re: Bad Data Input/High System Utilisation.

        Sanitizing data? NATS ever heard of that?

  11. NE-bot
    Coat

    Surely

    Surely a classic case for computing in the cloud?

    1. Anonymous Coward
      Anonymous Coward

      Re: Surely

      Have a downvote, unless you were being sarcastic. In which case sorry, too subtle for a friday!

      1. Michael Wojcik Silver badge

        Re: Surely

        too subtle for a friday!

        I wouldn't have thought the icon terrible subtle.

  12. Anonymous Coward
    Anonymous Coward

    S/390

    Hmm.

    I was sure that NATS could not be using an S/390. But it appears that they are!

    My sources suggest that they have a S/390 Multiprise 2000 model 204 system. According to the page at http://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_sm/3/897/ENUS2003/index.html&lang=en&request_locale=en, this system was marketed between October 1997 and February 2000, and the normal service offering was withdrawn at the end of 2007.

    This does indeed make this an old system!

    I have also heard on the grapevine that the hardware support is no longer done by IBM, but by a third party. Not that this is in any way reported as a hardware problem.

    I wonder if there is an architectural issue here. I had read somewhere that there were some programs written for S/370XA and ESA/390 architecture that would not run on a zSeries system, which would require them to keep a S/390 architecture system going. Indeed, the Multiprise 2000 series systems, of which the model 204 was one of the last ones marketed, were the last systems to support S/370EA mode.

    1. Destroy All Monsters Silver badge

      Re: S/390

      I'm sure they have kept the source code and there is no assembler involved. So adapting this to a proper zSeries would involve only around 6 digits... or not?

      sweating_towel_guy.jpg

      1. Michael Wojcik Silver badge

        Re: S/390

        I'm sure they have kept the source code and there is no assembler involved

        It's supposedly written in JOVIAL, and one of the strengths of JOVIAL is inline assembly, so I wouldn't be at all surprised if there weren't quite a bit of assembly there.

        That said, of course, most S/370 assembly code runs just fine on z.

        As the OP suggested, if they really are restricted to pre-z 3x0 machines, it's probably because of something like 370 EA (Extended Real Addressing) mode. That JOVIAL code might well be riddled with EA-mode OS requests - probably written in assembly.

    2. Anonymous Coward
      Anonymous Coward

      Re: S/390

      Would someone care to explain why they down voted me?

      1. Slap

        Re: S/390

        Oh, get over it. You should see some of the downvotes I've had.

        It's a comments section, not a career appraisal.

        1. Anonymous Coward
          Anonymous Coward

          Re: S/390

          I accept down votes. It's a fact of commenting on the El Reg. I've got lots of them against my comments in the past.

          But I actually wanted to know what it was they objected to.

          More often than not I don't post anon, but I have to protect my 'sources'.

          1. Richard Ball

            Re: S/390

            AC,

            People who go on about downvotes get downvotes.

            1. This post has been deleted by its author

    3. dlc.usa

      Re: S/390

      Well, I gave the AC an upvote (second I've ever granted, I think) because it's absolutely spot on. A Multiprise 2000 isn't even the end of that product line (3000). Almost a decade ago I was supporting both for a client. By 2009 the 2000 was out of support (IBM would provide a CE for $400/hour though). However, parts availability was becoming a big concern. I failed to convince management to buy one or two 2000s on eBay for parts cannibalization. I don't want to know what supporting a 2000 entails these days. While this failure does not seem to be in the hardware, the infrastructure is an accident waiting to happen. It's not the systems programmers (they have at least one, right?) that are at fault here, I'm certain.

    4. Roland6 Silver badge

      Re: S/390

      I think NATS are between a rock and a hard place...

      From what I can gather the system configuration is a little over 30 years old and was originally due for replacement in 2000, but has (in parts) been life extended, with current expectations on its replacement being fully operational in early 2016. Hence this may go some way to explaining why some of the platforms being used seem a little.

      Hence NATS have a problem, they need to maintain the current as-is system for another few years, warts and all, because it is largely a waste of money in making any significant investment in the existing system (like migrating from S/390 to Zseries), because (hopefully) the plug will be pulled in 2016. Also given the complexity of the system and number of parties involved (the radar network upgrade took 10 years and was only completed in 2013), I doubt there is much room to bring the go-live date for the new system forward; also the recent events almost certainly will result in additional testing of the new system.

      So NATS will patch the current system, touch wood and keep their fingers crossed. But then as others have noted there will be a period where the new system will be 'unreliable' as it gets bedded-in...

  13. Anonymous Coward
    Anonymous Coward

    So did Vince's sound-bite come from one of his spads or someone at GDS ?

  14. Anonymous Coward
    Anonymous Coward

    Failover software ?

    It used to be the case that - certainly for aircraft allowed to take off and land (where the appropriate land equipment was in place) they needed triple redundancy in everything, along with software from different vendors running on different hardware to avoid any possibility of an innate bug.

    Expensive, but essential.

    Curious why the systems shepherding these beasts around the sky have a lower bar ?

    1. Phil O'Sophical Silver badge

      Re: Failover software ?

      Probably because failure tends to be less immediately fatal. If the ILS bluescreens when you're 100 yds off the runway in thick fog you don't have much chance to switch to pencil and paper. For the NATS event the backup approach clearly worked exactly as it should, at worst a few people got delayed, that's less disruption than a passing snowstorm would have caused.

  15. JaitcH
    WTF?

    Talk about conflict of interest!

    Doesn't the CAA own some of the NATS stock?

  16. Frankee Llonnygog

    Of course that have bloody well skimped on investment

    £175million? Imagine only spending that much on Universal Credit. It would have been a disaster!

  17. Roverhoofdman

    what to do?

    New complex systems are not good, old complex systems are not new.

    1. Stretch

      Re: what to do?

      I don't know... I've seen many new, good, complex systems.

      I'd say new complex systems developed via huge contracts with tiny numbers of bidders, handled mainly by sub-sub-sub contractors, with minimal specification and testing, and encumbered by politicised decision making, are not good.

  18. Steven Jones

    Ludicrous sub head-line

    If the Register's summary is correct, Richard Deaken s didn't make a statement that "90s kit isn't 'ancient'". What he said was the system had it's roots in the 90's. To put this in context, the World Wide Web has its roots in the late 80's. For that matter, the first draft definition of TCP/IP dates from the early 70's.

    It's wholly irrelevant from when the technology originated. What matters is how it has been developed. After all, we are still basing out day-to-day usage of geometry based on what Euclid set out over 2,000 years ago. Roots matter. They stop trees falling over when the wind blows.

    In the meantime, please don't misrepresent what was said. The kit isn't from the 90's, and nobody seems to be seriously claiming this was a hardware failure.

  19. billse10

    " 99.7 per cent of aircraft were not delayed. Of the remaining 0.3 per cent, the average delay was 26 minutes."

    I wish Southeastern could do that. Or manage to have only 0.3 per cent of trains only delayed by a mere 26 minutes .....

    Oh, I forgot, they'll probably say only 0.3 per cent are delayed by the a mere 26 minutes. The rest are mostly delayed by between two and twenty minutes, or just cancelled, or they don't like running normal trains on Sundays, or some other reason.

  20. itzman

    25 years is a goood age..

    When I were doing some work on submarine cable back in the 80's, they had only JUST started using silicon..."we field tested germanium and we know it lasts 25 years, that data isn't available on silicon'....

    ...I asked a similar question of my oncologist 'what's the long term effect of this chemotherapy'? Answer 'when we get to long term, we will let you know, but 25 years on,. its seems to be OK...'

    Trued and tested if it good enough, is good enough.

  21. sleepy

    ...which reminds me of an ancient joke, which might help the politicians understand

    Lady boarding an ancient DC3 of tin pot airlines to stewardess:

    "This plane seems awfully old. Is it safe?"

    reply:

    "Madam, how do you think it got so old?"

  22. moonrakin

    Election influenced squawking

    Simple shallow opportunism from Cable - an election is not far off. Shame he's not so voluble about the other IT catastrophes that litter the gubmint IT landscape.- like wasting billions on duff/fraudulent software projects in the NHS hasn't ended a few lives prematurely? AIUI there is quite a high failure rate on the traffic controller courses - shame we can't test politicians likewise.

    The balance between features and maturity / stability / capacity must drive NATS and I know that the safety critical aspect of ATC is taken very seriously and that the sequencing of graceful control degradation is something that considerable effort is expended on. It's not Candy Crush.

  23. Anonymous Coward
    Anonymous Coward

    What would be interesting is to compare the delay from this NATS failure to the total delay that we get every single day because the road network has suffered under investment. What did the article say, 10,000 people with an average delay of 45 minutes. I'd bet one decent jam on the motorway easily beats that.

    As for it being an old system, good, I don't want to be banging around the sky directed by a system written in the last 6 months by the lowest bidder.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019