back to article Power cut crashes Delta's worldwide flight update systems

A computer outage has caused worldwide delays for thousands of passengers using Delta Airlines. The US carrier tweeted about the issues on Monday morning, blaming delayed and cancelled flights on a “computer outage." Delta, based in Atlanta, Georgia, subsequently blamed the crash on a massive power cut at 2.38am ET (7.38am …

Page:

  1. wolfetone Silver badge
    Joke

    *Delta knocks on the door of another airline*

    "Hi, I was just going to check if there was a power cut, but I saw one of your planes fly overhead with it's lights on"

    1. Destroy All Monsters Silver badge
      Headmaster

      It's....

      its

  2. tony2heads
    Boffin

    Leap Seconds

    Next positive leap second on 31st December (see IERS bulletin)

    Will people be ready for that one?

    1. John Sager

      Re: Leap Seconds

      This is why there is so much pressure to kill off leap seconds. The ITU recently kicked that can down the road for another few years, but personally, I don't see why this is still such a problem. We've had leap seconds for decades and computer time protocols have been designed to signal future leap seconds for a *long* time. It does involve the strange concept of a specific minute at the end of June or December should have 61 secs. The specs also allow for 59 secs too but that is unlikely now ever to happen.

      You could write the software to deal with a step, but Google decided to just slow down the computer notion of the time in a controlled fashion for several hours so that after the leap second actually occurs the clocks are exactly back in sync. That is probably much more friendly to existing applications.

      1. Bill M

        Re: Leap Seconds

        Novell implemented this quite some decades ago, way before Google even existed. It is called Synthetic Time

        1. Paul Crawford Silver badge

          Re: @Novell time

          And long before that we had ephemeris time (1952), and then TDT (1976), and then GPS from 1980 using continuous time with a leap-second offset rather like a time-zone.

          As I keep saying IT IS A KNOWN FEATURE and if your code can't handle it gracefully you are incompetent due to either:

          1) Not using tested system libraries to handle time, delays, etc.

          2) Writing or modifying said libraries without knowing what you are doing.

          And most of all NOT TESTING YOUR DAMN CODE! Really, just set up a fake NTP time server and have it generate leap seconds regularly backwards and forwards and see if your code works.

      2. choleric

        Re: Leap Seconds

        Because programmers are so lazy they would prefer to get other people to change the whole calendar than fix their code...

        1. Brewster's Angle Grinder Silver badge

          @choleric

          No, I think we should create a kickstarter to adjust the rotation of the Earth so it is always exactly 86400 seconds +/- a small Gaussian error.

          1. John Sager

            Re: @choleric

            Bravo sir! However it'll take more than a few bob to do that. Perhaps if you could persuade the moneybags that it would cure Gerbil Worming then you'll have far more money than you know what to do with.

            1. Brewster's Angle Grinder Silver badge
              Boffin

              Re: @choleric

              Actually, atmospheric temperatures affect the earth's rotation rate: the warmer the atmosphere, the fewer leap seconds needed. And since the year 2000 there have been many fewer leap seconds than predicted in the 90s; the graph shows a dramatic corner. Nobody is sure why.

              1. Esme

                Re: @choleric

                Uh, wouldn't that be because a warmer atmosphere expands, moving mass away from the centre of the Earth, so conservation of angular momentum demands that the Earth's rate of spin slows to compensate. Same thing with the seas, which although they wouldn't move as much, are considerably denser. I'm surprised if that's sufficient to affect the rotation by as much as several seconds in just a few years, but I can't be bothered to do the maths.

                1. Brewster's Angle Grinder Silver badge

                  @Esme

                  IIRC it's not the expansion of the atmosphere, but a reduction in viscosity that allows atmospheric tides to counterbalance lunar torque. However I'm being disingenuous: it was coming out an ice age and at a time when the moon was a lot closer. (600Myr ago, a 21 hour day.) And while the corner in the delta-T is striking, the earth's mass distribution is changing all the time and it's far more likely that's the cause.

          2. Stoneshop Silver badge
            Boffin

            Re: @choleric

            create a kickstarter to adjust the rotation of the Earth

            Just a swift kick in the right direction would remedy the problem already.

          3. David Harper 1

            Re: @choleric

            The length of the mean solar day *was* 86400 SI seconds around the middle of the 19th century. That rotation rate was embodied in the astronomical observations that were used to define Universal Time in the late 19th century. When UT was replaced as the best measure of time by Ephemeris Time and then by International Atomic Time in the 20th century, both ET and TAI were defined to have the same length of second as UT. And that's why we have a problem with leap seconds: the SI second reflects the rotation speed of the Earth almost 200 years ago, not today.

            1. Brewster's Angle Grinder Silver badge

              @David Harper

              To the nearest second, the length of day remains 86400 SI seconds and will likely remain so for many kiloyears.

    2. Paul Crawford Silver badge

      Re: Leap Seconds

      "Will people be ready for that one?"

      Well the one that followed the aircraft-bothering incident went with practically no issues at all. Simply because folk had woken up and tested things for the inevitable occurrence of another leap-second.

      In fact the Linux bug mentioned had been created by somebody modifying already-working time related code and not testing the damn thing for this situation. As others have already said, leap seconds and means to deal with them have been with us for decades already so its not new stuff. But every new generation of code monkeys seems to be able to break things...

      1. Mark 85 Silver badge

        Re: Leap Seconds

        But.. but...but... the users are the testers.

      2. boltar Silver badge

        Re: Leap Seconds

        "In fact the Linux bug mentioned had been created by somebody modifying already-working time related code and not testing the damn thing for this situation."

        Was it actually a Linux bug? I find that hard to believe since given the 10s of millions of installations of linux in backend server systems not to mention embedded systems around the world. I think an OS timing bug it would have caused more problems than just an airlines reservation system going down. Far more likely an application bug which was conveniently blamed on the OS. Also, what application crashes just because of a 1 second difference even if the OS was at fault??

    3. Codysydney

      Re: Leap Seconds

      We've had another leap second since the one that caused the grief, and this time Amadeus were ready for it - we had no problems.

    4. Locky

      Re: Leap Seconds

      Never mind leap seconds, in 2020 they are going to move Australia

      1. Roland6 Silver badge

        Re: Leap Seconds

        I found the whole idea of moving Australia quite funny - namely that GPS hadn't allowed for continental drift... From other investigations, GPS also has problems determining elevation, which also is subject to change...

        1. Alan Brown Silver badge

          Re: Leap Seconds

          "namely that GPS hadn't allowed for continental drift"

          GPS doesn't NEED to adjust fro continental drift. It's a set of global coordinates.

          The problem is that mapmakers hadn't made allowances.

          then again, 7cm/year is absolutely sprinting in geological terms.

  3. Dabooka Silver badge

    An hour and a half delay?

    That's not bad for Heathrow, what are they complaining about?

    Or is that just T5 and BA?

  4. GavinC

    "Quantas"

    Who? Oh, you mean Queensland and Northern Territory Aerial Services, a.k.a Qantas.

    1. yoganmahew

      While we're at it, it's Sabre, not "Sebre", and Virgin Australia are on the Sabre system, not the Altea system.

      Oh and Delta bought the code rights to the mainframe they are on (Deltamatic). It is still managed by Travelport (for infrastructure). http://www.travelmarketreport.com/articles/Delta-Reacquires-Res-Operations-Systems-From-Travelport

      It's not really clear what data centre was hit and whether it was the one that houses the mainframe. Delta reservations appeared to be okay.

      Grumpy old mainframer...

      1. Roland6 Silver badge
        Pint

        @Grumpy old mainframer...

        Suspect it wasn't the data centre hosting the mainframe, but the data centre hosting all the layers of screen scrapping and Web2.0 stuff :)

        1. yoganmahew

          @Roland6

          I suspect you're right!

  5. Jon 37

    Single point of failure

    Really? They have their main business-critical system at a single datacenter, without geographical redundancy, so a power cut at that datacenter can bring down the whole thing?

    I would have hoped such an essential system would be spread over 2 or 3 sites, so that losing one site has no impact on operations.

    1. Unicornpiss Silver badge
      Alert

      Re: Single point of failure

      UPS+backup generator=no problem? Unless of course the whole datacenter is wiped out by some unforeseen catastrophe. However a power failure isn't an unforeseen catastrophe, and something fairly easily prepared for. (with a little $$)

      1. RIBrsiq
        Coat

        Re: Single point of failure

        There should never be an unforeseen catastrophe. There should only ever be foreseen catastrophes that you decided not to protect against after crunching the numbers and judging the investment didn't make sense.

        1. Pascal Monett Silver badge

          Re: the investment didn't make sense

          Reasoning which holds up well until the catastrophe occurs and you see the bill for repairs. More often than not, you will then reevaluate your opinion of what "makes sense" as far as investments are concerned.

          True story : at an important government-level organization I will not name further, there was a kerfluffle when a senior engineer warned, in writing, all the way up the hierarchy, that the currently-at-the-time PC upgrade process was an open invitation to virii and expensive downtime.

          He was hauled into his managers' office for a right chewing out, which, being a senior engineer in a function from which nobody could oust him, he took with a verbal barrage of his own (likely containing many words such as "idiotic", "moronic", "abysmally stupid" etc - don't know, wasn't there, but I damn well hope so). Still, he was told that the investment "wasn't worth it" and that he should "stop making waves".

          As fate would have it, the tsunami hit later that year. An outdated PC piloted by a nincompoop got infected, the infection spread to the servers, and everything was shut down for at least 3 days. That's over 500 people with no more PCs for 24 work hours. You do the math.

          He did the math, and presented the cleanup bill with a scathing "I told you so" that, curiously, all the managers took quite meekly.

          The PC upgrade schedule was changed after that. Unbelievable, ain't it ?

          1. A Non e-mouse Silver badge

            @Pascal Monett - Re: the investment didn't make sense

            I've been involved in several incidents over the years where I've said to the boss: "We need to spend X to replace an aging/failing system." I'd be asked: "Is it currently broken or about to fail?", and when I replied "No", was told to forget about it.

            Later on, the system in question would die and management would complain about people not being able to do their jobs. A blank cheque was usually swiftly provided to replace said faulty system.

          2. onefastskater

            Re: the investment didn't make sense

            Proving once again wisdom comes to us when it can no longer do any good!

        2. Marc 25

          Re: Single point of failure

          Rumsfeld:

          Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

          1. Khaptain Silver badge

            Re: Single point of failure

            This is what Nicolas Taleb described as the Black Swan Theory, he has his way of seeing things, which is usually quite rational.

            1. Destroy All Monsters Silver badge

              Re: Single point of failure

              "There are always bigger fish to fry!"

      2. Jon 37

        Re: Single point of failure

        > UPS+backup generator=no problem?

        Then your single-points-of-failure include the UPS, the generator, the power distribution units, and the emergency-power-cut-off buttons. UPSs, generators and PDUs all break. Idiots switch things off for maintenance without thinking of the consequences, or press emergency-power-cut-off buttons by mistake.

        You can have multiple generators or UPSs, although you still have the risk of a design flaw taking them all out when they're needed (e.g. http://www.zdnet.com/article/365-main-details-sf-outage-problems/ ).

        There have been plenty of stories about datacenter power outages on The Register, despite the standard UPS+generator.

      3. tom dial Silver badge

        Re: Single point of failure

        I remember about 15 years back having my terminal screen wink out while working on a system a thousand or so miles distant at a US military data center. Not coincidentally, others nearby working on various other systems there had the same experience at the same time, and ensuing discussions with the SA revealed that all power to the main computer building had dropped because a contractor (WHO HAD BEEN TOLD) severed the cables from the oubuilding containing the substation, the redundant UPSs, and the backup generators. Power was restored around 6 hours later.

      4. Alan Brown Silver badge

        Re: Single point of failure

        It's actually possible to RAID your UPS systems and run everything via the UPS rather than introduce a switchbreak (large systems use a flywheel and this has the effect of supplying conditioned power to the site).

        It's also possible to RAID the generators that back the UPSes.

        As someone else has mentioned, the problem is managers looking to get a bonus for cutting costs who end up ripping resiliance out of the systems. I wonder if they'd be so keen if they were made liable for the costs of system failures if it traces back to their cost-cutting.

    2. alain williams Silver badge

      Re: Single point of failure

      Some twat manager was probably given a bonus for ''introducing efficiencies'' which he did by removing some ''underused'' systems and ''consolidating'' several sites to one place....

      1. Destroy All Monsters Silver badge

        Re: Single point of failure

        Or the americans found somebody to liberate into democracy.

      2. Mark 85 Silver badge

        Re: Single point of failure

        Worked a privately owned company where the owner decided that the backup generator hadn't been needed in years, so he had it moved and connected to his house. I'll leave to the reader to visualize what happened 6 months later when the power failed at the plant/office. Two months later, we were getting new dual UPS's and two diesel generators.

      3. ecofeco Silver badge

        Re: Single point of failure

        That's the way I'd bet, Alain.

    3. Daedalus Silver badge
      Facepalm

      Re: Single point of failure

      They probably used up all the fuel doing weekly backup generator tests.

      1. Herby Silver badge

        Re: Single point of failure

        The running out of fuel for the backup generator might be because the manager's vehicle was a diesel, and somehow got depleted every time he went to check the generator setup.

    4. Lars Silver badge
      Happy

      Re: Single point of failure

      The problem tends to be that nobody ever has the rights and the guts to actually force a real time test. I suppose it's as well that they don't do real time nuclear rockets tests, 8 inch floppy disks or not. What I have seen more than once is companies who think their backups are fine until they find out it's never tried and total rubbish for years (typically backing up stuff from the wrong place, that was the right place long ago).

      1. Dwarf Silver badge

        Re: Single point of failure

        @Lars

        Go and look up the Netflix Chaos Monkey and its parent Simian Army which does just that, it generates different failures in various tiers of the of their applications within Amazon AWS to ensure that the applications and infrastructure can fail over in the correct manner when things fail.

        See Simian Army

    5. wayne 8

      Re: Single point of failure

      Delta management is not the sharpest corporate tool.

      1. Anonymous Coward
        Anonymous Coward

        Re: Single point of failure

        But definitely Corporate Tools?

  6. Ol'Peculier
    FAIL

    Worse. Airline. I. Have. Ever. Flown. With.

    LHR to ATL about 20 years ago. Rude staff, shoddy aircraft, and the code-share they had with Virgin coming back was a joke, not knowing which check-in desk to use at JFK was bonkers. At least we flew back with Virgin, which was one of the best flights I've ever had (although that might be because we got bumped...)

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019