but is it open source?
Customers of O2, GiffGaff and virtual operators who use Telefonica's network in the UK have been hit by a spectacular outage across the country. Transport information services have also been affected. Ericsson, whose central user database caused O2's 2012 25-hour mega-outage, told The Reg: "We are aware of the issue and are …
but is it open source?
People seem to carry a lot of stuff these days. I rarely carry little more than one visa card, one backup credit card, one phone, plenty of cash and two keys. I use a slim wallet for the cash and my mobile case for the cards, so I have no loyalty cards. If it doesn't comfortably fit in my trouser pocket then I don't carry it.
According to the FT and the Telegraph.
Yeah, some idiot installed Win10 Autumn update.
Don't quote me on this, but isn't that what Ulrika Jonsson (sp?) said on the ending of their relationship?
They believe the "3rd party supplier" are Ericcson.
Well if Huawei are ruled out then its pretty much a 50/50 between Ericsson and Nokia.
I think they're an Ericsson shop, but curious how Telefonica's managed to engineer a single point of failure. But given the outage seems to have started at 4am, suspect a bit of planned maintenance didn't go quite as planned. And neither did any plan to rollback to undo it.
Engineering out a single point of failure nearly always leads to a complex system which can fail in catastrophically complex ways
I think they're an Ericsson shop, but curious how Telefonica's managed to engineer a single point of failure.
If you go back in history long enough, you can trace O2's heritage to BT. I suspect that the culture of being institutionally s**t is in their DNA.
"suspect a bit of planned maintenance didn't go quite as planned. And neither did any plan to rollback to undo it."
So Ops were given the Change Request at 9pm last night, hurriedly signed off by a manager ( now negotiating his golden handshake to leave I might add! ), they accidentally copied the dev creds to the Jenkins cluster and flooded the network with pre-UAT code pulled from the wrong Git repo branch! Ha ha!
Yep, software upgrade maybe?!
Yeah, my first thought was: just roll back to the previous install.
Depending on what the update/change did, it's sometimes not that simple..
So a loooong time ago, I experienced a Cascade(ing) failure. They were a trusty and popular frame-relay & general data switch used by a lot of ISPs and telcos. So there was a sofware update. We tested in the lab, and all was fine. We implemented it.. and it found a back-up control card, switched that to primary, but didn't connect properly to the management database, and corrupted that. Rolling back got the switches back under management, but (I think) left updated microcode so the backup database wasn't compatible.. So FUN!
Luckily(?) we had a backup-backup from doing an SQL dump of each switch's config to trusty plain text files so we could manually rebuild the database. Which took about 48hrs.. Luckily it didn't interrupt services for that long, just delayed any MACDs until we'd got everything back in sync. Early versions of Cisco's config management created similar FUN! situations.
I suspect Telefonica's issue is much the same, ie update broke state, and given it's size and number of devices/users, much more painful to get back under control.
My fault, switched from TalkTalk to O2 this week, must have been something I fiddled with...
Moving away from TalkTalk = smart move. Moving away from TalkTalk to o2 right now - not so much :(
Latest update it's a global software fault that is not just for O2.
My favourite Twitter post (courtesy of Google News, funnily enough - I don't use Twitter) blamed the outage for them going to be late getting to a meeting this morning.
So - did the outage locally stop time as well, or were you just too busy messing about on your phone to get up and go to work ?
Personally, my issue with the outage this morning was that my phone decided to generate a notification telling me that there was no data at my location at 05:38 which woke me up. So I for one got to my first meeting early...
"I'm going to have to buy a satnav because I'm already late for my first meeting - because I can't find the place".
-giffgaff user this morning.
Satnav function on my phone worked fine, the only bit that didn't was the traffic status which COULD explain the 'being late for a meeting'.
As it was, my luck ran the other way. I took the route less travelled, because it looked lighter on traffic from the roundabout... and got to work 15m early
Not often being on Three means I have more data throughput than others around the City of London :smugface:
For those who rely on a data signal and are on O2, can do a lot worse than to get a cheap MiFi and a Three PAYG SIM on their 3-2-1 tariff.
People are suffocating...
Ahh the Oxygen of publicity
No bonus for any board member on any year with a major outage or security fail.
Of course, the top floor parasites would just spend most of the year arguing about what 'major' means.
O2 acknowledged the problem on Twitter at 07:00 GMT.
Sadly all the people on O2 had no data so they couldn't get on twitter anyway.
It will get worse and more critical. What is the worst? Having your business TOTALLY depend on internet connections (fibre, DSL, Mobile) or the so called Cloud.
The biggest risk isn't a solar flare taking out satellites, or even global Nuclear War. Or cyberwarfare or criminal hackers. It's creeping monoculture, maybe eventually only 3 Eco systems. We have maybe only four major mobile infrastructure companies now, one is Chinese Gov owned (ZTE) and one sort of private Chinese (Huawei). Nokia ate Lucent/Alcatel, Siemens and Motorola Networks and there is Ericsson.
Cloud providers and the OSes they use? Linux is fine, but maybe a patch pushed out by management late on Friday, maybe before a holiday. Might be for Servers, Edge Routers or both. Even MS uses Linux exclusively on some bits of their cloud.
I wrote a post apocalyptic story with lots of mayhem and death. I decided it was too dark so wrote one with a fantasy setting set slightly in the future where all retail POS, cash machines, Mobile and fixed line billing and even a lot of SCADA relies on the Internet and a handful of Cloud Service providers (Renting space on someone else's remote server like 1960s). "No Silver Lining" Ray McCarthy. You can download 1st 20% free.
All mobile, internet, cloud services etc WILL fail at the same time, sooner than later. How much retail, wholesale, SCADA (Traffic lights, Electricity distribution configuration, sewage & water pumps etc) now depend on it?
I quite like satellites, I'll try not to take them out :)
> I quite like satellites, I'll try not to take them out :)
Not even for a swift pint which inevitably results in them falling over?
Are folk that dense to assume no buses are operating because an app doesn't update? Hang on, I think I've just answered that one myself.
I don't think it's just the app, but the automatic ETA signs at the bus stops. The theory being, if the sign doesn't show any buses are on the way, perhaps no buses are on the way.
However on main routes in London you can usually just open your eyes and look up from your phone and notice several buses in any direction you care to look.
My heart bleeds. Chance would be a fine thing for decent bus prediction times Oop North. Especially late on Saturdays some buses just choose not to turn up and I have to revert to taking the slow way home. I would take a train, but the RMT are on strike every Saturday.
This morning the ETA signs showed a message along the lines of "traffic information unavailable, please check at tfl.gov.uk", so it was quite obvious that buses were still running.
Just think how much in data charges the likes of TfL are saving...!!!
(Anon because I work for one of their ETA sign suppliers...)
On the outskirts of the capital, when the board says 1 min, it means the bus will show up in about 5 minutes time, so you learn to ignore them and wait calmly til it eventually shows up, hoping there's room to get on.
Upside of outage : bus not full today of people gormlessly flicking through Twitter and Facebook.
Downside : bus even more full of people yapping into their phones, all complaining that Internet's not working. May the person who thought of unlimited call plans spend eternity on a crowded rush hour bus in my London suburb.
Except the tfl website also didn't know where the buses were, because the buses themselves also used o2 to tell control where they are.
I use an O2 reseller so we can keep a remote eye on a vulnerable elderly relative who doesn't have wired internet. So that's not working today.
Perhaps it's going too far to install a backup connection there, but it's surprising to read that large organisations depend on a single supplier (and that the supplier neither tests software enough before its publication nor has a mechanism quickly to revert to what worked).
Isn't there still some super plan to put the emergency services onto 4G? Has this been thought through?
Yep, I imagine EE and the Government are holding a committee meeting right now :)
AQL's MVNO services were available since exactly this outage picked up pace, while I've had API problems for the last 2 days (they use Three who also have a spike on down detector but that's anecdotal I imagine)
I'm on Tesco Mobile, my phone has no internet anyway so I'm not too bothered about that. But I haven't been able to send a single text message all morning, including now at 12:35pm.
I'm outraged etc etc. Secretly I'm glad though as I don't have to talk to anyone and can actually do work.
An update 20 hours on:
O2 say they've fixed the problem and it's all back to norman, but I'm still sending texts and I'm still getting messages saying the texts weren't delivered or sent. So yeah, clusterfuckarooney.
I reckon they were trying to remove all the Huawei kit from their backhaul on the quiet...
An upvote for that but my theory is it's chinese hackers because everyone is removing their Huawei kit. Hua Wei and the Art of (trade) War.
Imagine thinking you can quickly whip up some meatballs when everyone just wants sweet and sour pork like last week.
Would that be "didn't have one before but really ought to" or "X has left - replace them ASAP" ?
so the finger points at the n00b?
I love the way they immediately set the “blame path” to “third party suppliers” and then try to disperse the pain and further divert the blame by saying that other networks are affected throughout the world.
Then they advise you to use WiFi.
Why not show a bit of humility and have the outage posted front and centre on their main web page?
Anyway I’m surprised they haven’t blamed brexit!
The third party suppliers, large enough to supply a company the size of O2 with significant infrastructure, doesn't roll out new updates to a test network first and doesn't have a rollback procedure in the case of emergency, in which case O2 picked an incompetent supplier.
Or O2 doesn't have the above (and they should, even if the supplier already does it.. you never trust new builds until you've validated them internally), and they're incompetent.
Biting the hand that feeds IT © 1998–2018