back to article Total Inability To Support User Phones: O2 fries, burning data for 32 million Brits

Customers of O2, GiffGaff and virtual operators who use Telefonica's network in the UK have been hit by a spectacular outage across the country. Transport information services have also been affected. Ericsson, whose central user database caused O2's 2012 25-hour mega-outage, told The Reg: "We are aware of the issue and are …

Page:

    1. Commswonk Silver badge

      @ Jay 2: I now await the examples of rage from users on Twitter who have based their entire existence on O2 infrastructure... If it's that important, have some sort of backup you need help. Urgently.

    2. Soruk

      Personal line is on giffgaff, and since we have to sort out our own work mobile (we get a mobile allowance for this) I deliberately put it on a completely different network. Still, I'm fscked if both EE and O2 go down at the same time.

  1. Buzzword

    Ericsson software apparently to blame

    According to the FT and the Telegraph.

    1. Ian Emery Silver badge

      Re: Ericsson software apparently to blame

      Yeah, some idiot installed Win10 Autumn update.

    2. The Nazz Silver badge

      Re: Ericsson software apparently to blame

      Don't quote me on this, but isn't that what Ulrika Jonsson (sp?) said on the ending of their relationship?

  2. WonkoTheSane
    Holmes

    BBC are naming names

    They believe the "3rd party supplier" are Ericcson.

    1. Anonymous Coward
      Anonymous Coward

      Re: BBC are naming names

      Well if Huawei are ruled out then its pretty much a 50/50 between Ericsson and Nokia.

      1. Jellied Eel Silver badge

        Re: BBC are naming names

        I think they're an Ericsson shop, but curious how Telefonica's managed to engineer a single point of failure. But given the outage seems to have started at 4am, suspect a bit of planned maintenance didn't go quite as planned. And neither did any plan to rollback to undo it.

        1. thondwe

          Single Point of Failure

          Engineering out a single point of failure nearly always leads to a complex system which can fail in catastrophically complex ways

        2. Hans Neeson-Bumpsadese Silver badge

          Re: BBC are naming names

          I think they're an Ericsson shop, but curious how Telefonica's managed to engineer a single point of failure.

          If you go back in history long enough, you can trace O2's heritage to BT. I suspect that the culture of being institutionally s**t is in their DNA.

        3. FuzzyWuzzys Silver badge
          Happy

          Re: BBC are naming names

          "suspect a bit of planned maintenance didn't go quite as planned. And neither did any plan to rollback to undo it."

          So Ops were given the Change Request at 9pm last night, hurriedly signed off by a manager ( now negotiating his golden handshake to leave I might add! ), they accidentally copied the dev creds to the Jenkins cluster and flooded the network with pre-UAT code pulled from the wrong Git repo branch! Ha ha!

          1. J.G.Harston Silver badge

            Re: BBC are naming names

            Yeah, my first thought was: just roll back to the previous install.

            1. Jellied Eel Silver badge

              Re: BBC are naming names

              Depending on what the update/change did, it's sometimes not that simple..

              So a loooong time ago, I experienced a Cascade(ing) failure. They were a trusty and popular frame-relay & general data switch used by a lot of ISPs and telcos. So there was a sofware update. We tested in the lab, and all was fine. We implemented it.. and it found a back-up control card, switched that to primary, but didn't connect properly to the management database, and corrupted that. Rolling back got the switches back under management, but (I think) left updated microcode so the backup database wasn't compatible.. So FUN!

              Luckily(?) we had a backup-backup from doing an SQL dump of each switch's config to trusty plain text files so we could manually rebuild the database. Which took about 48hrs.. Luckily it didn't interrupt services for that long, just delayed any MACDs until we'd got everything back in sync. Early versions of Cisco's config management created similar FUN! situations.

              I suspect Telefonica's issue is much the same, ie update broke state, and given it's size and number of devices/users, much more painful to get back under control.

        4. silks

          Re: BBC are naming names

          Yep, software upgrade maybe?!

  3. The Real Tony Smith

    Sorry

    My fault, switched from TalkTalk to O2 this week, must have been something I fiddled with...

    1. silks

      Re: Sorry

      Moving away from TalkTalk = smart move. Moving away from TalkTalk to o2 right now - not so much :(

  4. Wolfclaw Silver badge

    Latest update it's a global software fault that is not just for O2.

  5. ItsMeDammit

    Outages get blamed for everything.

    My favourite Twitter post (courtesy of Google News, funnily enough - I don't use Twitter) blamed the outage for them going to be late getting to a meeting this morning.

    So - did the outage locally stop time as well, or were you just too busy messing about on your phone to get up and go to work ?

    Personally, my issue with the outage this morning was that my phone decided to generate a notification telling me that there was no data at my location at 05:38 which woke me up. So I for one got to my first meeting early...

    1. werdsmith Silver badge

      Re: Outages get blamed for everything.

      "I'm going to have to buy a satnav because I'm already late for my first meeting - because I can't find the place".

      -giffgaff user this morning.

      1. Martin-73 Silver badge

        Re: Outages get blamed for everything.

        Satnav function on my phone worked fine, the only bit that didn't was the traffic status which COULD explain the 'being late for a meeting'.

        As it was, my luck ran the other way. I took the route less travelled, because it looked lighter on traffic from the roundabout... and got to work 15m early

  6. Steve Cooper

    Not often being on Three means I have more data throughput than others around the City of London :smugface:

    1. Soruk

      For those who rely on a data signal and are on O2, can do a lot worse than to get a cheap MiFi and a Three PAYG SIM on their 3-2-1 tariff.

  7. TRT Silver badge

    People are suffocating...

    O2 deprivation.

    1. Korev Silver badge
      Coat

      Re: People are suffocating...

      People are suffocating...

      O2 deprivation.

      Ahh the Oxygen of publicity

  8. Whitter
    Mushroom

    How to get businesses to care?

    No bonus for any board member on any year with a major outage or security fail.

    Of course, the top floor parasites would just spend most of the year arguing about what 'major' means.

  9. adam payne Silver badge

    O2 acknowledged the problem on Twitter at 07:00 GMT.

    Sadly all the people on O2 had no data so they couldn't get on twitter anyway.

  10. Mage Silver badge

    Quite expected.

    It will get worse and more critical. What is the worst? Having your business TOTALLY depend on internet connections (fibre, DSL, Mobile) or the so called Cloud.

    The biggest risk isn't a solar flare taking out satellites, or even global Nuclear War. Or cyberwarfare or criminal hackers. It's creeping monoculture, maybe eventually only 3 Eco systems. We have maybe only four major mobile infrastructure companies now, one is Chinese Gov owned (ZTE) and one sort of private Chinese (Huawei). Nokia ate Lucent/Alcatel, Siemens and Motorola Networks and there is Ericsson.

    Cloud providers and the OSes they use? Linux is fine, but maybe a patch pushed out by management late on Friday, maybe before a holiday. Might be for Servers, Edge Routers or both. Even MS uses Linux exclusively on some bits of their cloud.

    I wrote a post apocalyptic story with lots of mayhem and death. I decided it was too dark so wrote one with a fantasy setting set slightly in the future where all retail POS, cash machines, Mobile and fixed line billing and even a lot of SCADA relies on the Internet and a handful of Cloud Service providers (Renting space on someone else's remote server like 1960s). "No Silver Lining" Ray McCarthy. You can download 1st 20% free.

    All mobile, internet, cloud services etc WILL fail at the same time, sooner than later. How much retail, wholesale, SCADA (Traffic lights, Electricity distribution configuration, sewage & water pumps etc) now depend on it?

    1. Solarflare

      Re: Quite expected.

      I quite like satellites, I'll try not to take them out :)

      1. JoshOvki

        Re: Quite expected.

        > I quite like satellites, I'll try not to take them out :)

        Not even for a swift pint which inevitably results in them falling over?

  11. Dabooka Silver badge

    Do people really need reminding buses are still running?

    Are folk that dense to assume no buses are operating because an app doesn't update? Hang on, I think I've just answered that one myself.

    1. David Nash Silver badge

      Re: Do people really need reminding buses are still running?

      I don't think it's just the app, but the automatic ETA signs at the bus stops. The theory being, if the sign doesn't show any buses are on the way, perhaps no buses are on the way.

      However on main routes in London you can usually just open your eyes and look up from your phone and notice several buses in any direction you care to look.

      1. BinkyTheMagicPaperclip Silver badge

        Re: Do people really need reminding buses are still running?

        My heart bleeds. Chance would be a fine thing for decent bus prediction times Oop North. Especially late on Saturdays some buses just choose not to turn up and I have to revert to taking the slow way home. I would take a train, but the RMT are on strike every Saturday.

      2. FrogsAndChips Bronze badge

        Re: Do people really need reminding buses are still running?

        This morning the ETA signs showed a message along the lines of "traffic information unavailable, please check at tfl.gov.uk", so it was quite obvious that buses were still running.

        1. Anonymous Coward
          Anonymous Coward

          Re: Do people really need reminding buses are still running?

          Just think how much in data charges the likes of TfL are saving...!!!

          (Anon because I work for one of their ETA sign suppliers...)

        2. the hatter

          Re: Do people really need reminding buses are still running?

          Except the tfl website also didn't know where the buses were, because the buses themselves also used o2 to tell control where they are.

    2. SVV Silver badge

      Re: Do people really need reminding buses are still running?

      On the outskirts of the capital, when the board says 1 min, it means the bus will show up in about 5 minutes time, so you learn to ignore them and wait calmly til it eventually shows up, hoping there's room to get on.

      Upside of outage : bus not full today of people gormlessly flicking through Twitter and Facebook.

      Downside : bus even more full of people yapping into their phones, all complaining that Internet's not working. May the person who thought of unlimited call plans spend eternity on a crowded rush hour bus in my London suburb.

  12. Mike Shepherd
    Meh

    Surprising

    I use an O2 reseller so we can keep a remote eye on a vulnerable elderly relative who doesn't have wired internet. So that's not working today.

    Perhaps it's going too far to install a backup connection there, but it's surprising to read that large organisations depend on a single supplier (and that the supplier neither tests software enough before its publication nor has a mechanism quickly to revert to what worked).

  13. Herring`

    Hang on

    Isn't there still some super plan to put the emergency services onto 4G? Has this been thought through?

    1. silks

      Re: Hang on

      Yep, I imagine EE and the Government are holding a committee meeting right now :)

  14. Anonymous Coward
    Anonymous Coward

    On the other hand

    AQL's MVNO services were available since exactly this outage picked up pace, while I've had API problems for the last 2 days (they use Three who also have a spike on down detector but that's anecdotal I imagine)

  15. wolfetone Silver badge

    I'm on Tesco Mobile, my phone has no internet anyway so I'm not too bothered about that. But I haven't been able to send a single text message all morning, including now at 12:35pm.

    I'm outraged etc etc. Secretly I'm glad though as I don't have to talk to anyone and can actually do work.

    1. wolfetone Silver badge

      An update 20 hours on:

      O2 say they've fixed the problem and it's all back to norman, but I'm still sending texts and I'm still getting messages saying the texts weren't delivered or sent. So yeah, clusterfuckarooney.

  16. Alister Silver badge

    I reckon they were trying to remove all the Huawei kit from their backhaul on the quiet...

    1. Dr Who

      An upvote for that but my theory is it's chinese hackers because everyone is removing their Huawei kit. Hua Wei and the Art of (trade) War.

    2. Anonymous Coward
      Anonymous Coward

      Madness!

      Imagine thinking you can quickly whip up some meatballs when everyone just wants sweet and sour pork like last week.

  17. Gricehead

    Awkward

    https://jobs.ericsson.com/jobs/265037?lang=en-us

    1. Flywheel Silver badge

      Re: Awkward

      Would that be "didn't have one before but really ought to" or "X has left - replace them ASAP" ?

    2. cantankerous swineherd Silver badge

      Re: Awkward

      so the finger points at the n00b?

  18. anthonyhegedus Silver badge

    I love the way they immediately set the “blame path” to “third party suppliers” and then try to disperse the pain and further divert the blame by saying that other networks are affected throughout the world.

    Then they advise you to use WiFi.

    Why not show a bit of humility and have the outage posted front and centre on their main web page?

    Anyway I’m surprised they haven’t blamed brexit!

    1. TonyHoyle

      So either:

      The third party suppliers, large enough to supply a company the size of O2 with significant infrastructure, doesn't roll out new updates to a test network first and doesn't have a rollback procedure in the case of emergency, in which case O2 picked an incompetent supplier.

      Or O2 doesn't have the above (and they should, even if the supplier already does it.. you never trust new builds until you've validated them internally), and they're incompetent.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019