back to article Expired cert... Really? #O2down meltdown shows we should fear bungles and bugs more than hackers

It's a bit of a cliche that "everything's connected", but O2's stunning outage yesterday – chalked up by Swedish kitmaker Ericsson to an expired software certificate – is a reminder of how true that is. Payment terminals croaked, bus displays went blank. Strangers blinked at each other in the street, like Robinson Crusoe …

Re: Painter's 2nd Law of IT - Fixed it for you...

That's if you're using MS-Windows. cron doesn't need to be 'updated' when the root password changes.

Oh, you're manually updating certificates?

2
3
Silver badge

Re: Painter's 2nd Law of IT - Fixed it for you...

"the system is down, nothing is working...."

"what time did it stop working? "

"midnight...."

Ahh, I see.

4
0

Re: Painter's 2nd Law of IT - Fixed it for you...

Are you auto renewing certs without checking with app owner first ?

1
0
Silver badge
Facepalm

The difference is that buses failed safe - the network connections failed, but the buses still ran.

As I heard it, the Ericsson software was just used for billing usage. But because O2 couldn't track customers usage, they denied them access completely.

I think O2 should have to credit every account with 20p, even if the customer didn't complain (or get through to complain). Costly enough to impact execs bonuses, but cheaper to implement than handling 32m complaints, so even then they get off lightly. And if I have to waste £10-worth of my time to get a 20p credit, that just adds insult to injury - I'd be asking for the £10 rather than 20p

20
0

the buses still ran...

Yes, this time they still ran but in the future maybe not.

As things are headed now everything will stop working when a network meltdown occurs.

14
0
TRT
Silver badge

The buses ran... but customers on O2 networks couldn't pay their bus fares (in London at least), nor did their shiny work on the Tube, and if you delay more than 100ms at the gateline in London, then you are to be crucified against the TfL roundel by your former fellow commuters that you have held up. But that's OK because all the little iPads they gave to Tube staff when they eliminated ticket offices a few years ago (the ticket offices where the machines had ludicrously outdated bits of electric string) run off O2 (except they fall back to the WiFi which was mostly Virgin, so the effect was minimal).

2
0
Silver badge

"The difference is that buses failed safe - the network connections failed, but the buses still ran."

There was a comms failure on our local metro system the other week. Complete shutdown of the system resulted despite the fact there far fewer vehicles involved, no other vehicles other than authorised ones with trained operators, very few junctions, but, no, to be safe, it all has to stop. Could you imagine the reaction of roads being closed because traffic lights failed?

Admittedly, there are stretched os single line operation and even sections where the light rail shares track with main line trains, so I suppose those sections might be more dangerous to operate without comms or signalling.

0
1
Silver badge

Admittedly, there are stretched os single line operation and even sections where the light rail shares track with main line trains, so I suppose those sections might be more dangerous to operate without comms or signalling.

Suggest you read up on the early railways and why signalling systems were developed...

0
0

The wrong day

I picked the wrong day to quit heroin

24
0
Silver badge

Re: The wrong day

Is there ever a right day?

5
0
Silver badge

Re: The wrong day

The day you overdose?

7
0
Silver badge

Re: The wrong day

"The day you overdose?"

Technically that's quitting too, after all , you won't be doing any more will you?

16
0
Anonymous Coward

Re: The wrong day

@mort

I can see where you got your name from.

13
0
Silver badge

Re: The wrong day

Technically that's quitting too, after all , you won't be doing any more will you?

I'm not sure I'd consider a hit that lasts the entire of the rest of your life quitting. That's the holy grain of getting high.

13
0

Maybe the network needs a friend

Travel on the trains and when One carrier is having issues it's tickets are valid on the others for a short duration. Maybe if one carrier is having a 'Really bad day'[tm] the others could let their customers on theirs.

You never know it might be like good for everyone.. so it will never happen

2
7

Re: Maybe the network needs a friend

The article explains why this is a bad idea - sudden influx of customers onto another network might bring that network down too, causing a cascade effect.

18
0

Re: Maybe the network needs a friend

"Travel on the trains and when One carrier is having issues it's tickets are valid on the others for a short duration."

Not valid on other carriers from Paddington station, and I suspect that's true of most commuter terminals.

2
1
Silver badge

V2X

I assume this is something to do with controlling autonomous vehicles.

If it is, then it's worrying. An autonomous vehicle must be able to work without a network connection! For emergencies and for areas without 5G. All it needs is to know what is around it - it doesn't need the latest news on traffic problems 300 miles away. It should be able to rely on its own sensors, and, possibly, short-range comms to chat to nearby vehicles. That's it. Updates can wait until it's next connected, like phones.

20
0

Re: V2X

You mean it should actually be, like _Autonomous_ ??

17
0
Silver badge

Re: V2X

"An autonomous vehicle must be able to work without a network connection!"

If it needs a network connection it isn't autonomous.

10
0
Silver badge

Re: V2X

We've been told that the low-latency modes of 5G are required for V2X (vehicle-to-everything)

V2X is going to be necessary for smooth traffic flow -- negotiating permission with oncoming traffic to make a left turn (for those of us who drive on the right)/right turn (for those who drive on the wrong) for example. And it's probably how the folks that are repairing yonder bridge are going to tell your car that that area that looks like a hole in the pavement is in fact a hole in the pavement. It's not clear that it needs a lot of bandwidth or especially high speeds. But it probably does need latencies never more than a few hundred ms. And of course it needs standards that are unambiguous and are actually adhered to.

2
5
Silver badge

Re: V2X

Vehicles will need (or at least want) to communicate with one another, yes. But there's absolutely no reason they need to communicate via a cell tower. They will be in close proximity to one another and can communicate directly, there's no need to go to/from a cell tower which will often be further away than the cars that need to talk to each other.

As long as autonomous cars have to share the road with human driven vehicles they will need to be able to operate without any V2V communication though. They can't trust humans to always signal a turn etc. so they will still need to drive defensively and not fully trust the info they get from other vehicles.

The exception to that trust would be for things like drafting bumper to bumper in the left lane, obviously you'd need to trust that the cars ahead will act appropriately and the lead car will alert the rest of a hazard that will require braking or steering. So sorry, no user modifiable software allowed!

5
0
Bronze badge
Trollface

Could've been worst.

O2 could've been managed by either IBM or Capita.

20
0
Silver badge

You never see, in those future dystopia movies, that the real cause of societal breakdown was due to shite service companies.

Except for Douglas Adams, who hit the nail on the head with Sirius Cybernetics.

43
0
Silver badge

"Except for Douglas Adams, who hit the nail on the head with Sirius Cybernetics."

Douglas Adams proved that an English degree from Cambridge can make you a better futurologist than someone with a STEM degree. I'm not sure what that proved.

21
0
Coat

Shurely you can't be Sirius?

5
0
Anonymous Coward

Douglas Adams proved that an English degree from Cambridge can make you a better futurologist than someone with a STEM degree.

Please don't. You're only encouraging that Fry chap.

18
1

I'm sure in Blade Runner 2049, there's a huge IBM logo. A dystopia indeed!

5
0

Must...not...

OK, I am Sirius. And don't call me Shirley.

6
0
Silver badge

-->Please don't. You're only encouraging that Fry chap.

Note in my post I wrote "can make", not "does make".

DNA was a genius. (Possibly why he died young, quos amo deum morietur puer.) Fry's father is/was a bit of a genius. Fry is just very clever.

3
0

Re: -->Please don't. You're only encouraging that Fry chap.

Fry just thinks he's very clever.

There, FTFY.

1
0
Silver badge

Seems to be many problems are down to large organisations not being able to use Outlook calendars or a big calendar on the wall with garish-coloured post-it notes.

If the beancounters can get something done by a certain date, why can't the IT monkeys?

4
2
Unhappy

Because certificates typically expire after 2-3 years - beancounters and bosses cannot see that far ahead (except when pulling "strategies" out of various orifices).

Even the IT monkeys doing the renewals have moved to new offices at least 3 times, so that two your old calendar with the post-it notes? Noone remembers what it was for, so it goes down the bin.

27
1
Silver badge

Only tangently related, but it reminded me of a policy at my last place of work that I managed to change.

If it was required to run a one off job on a machine overnight (yeah... no "at" batch command) then it was recommended that you put the job in cron, scheduled to run the next day, on that day-of-month, on that month.... so that your job wouldn't be run the next day too if you didn't remove the cron entry in time.

Yes, you've got it - there were a number of times where some system would "randomly" cock up, and be traced to some date specific cron job that no-one remembers anything about, and which is presumably at least a year old.

8
0
Anonymous Coward

Generally beacounters delay payment as long as possible without actually getting sued. The idea that you cannot convince a cert that the cheque’s in the post is literally beyond their tiny minds

25
0
Anonymous Coward

Errr.

Because the bean counters were the people responsible for outsourcing the IT department to a provider incapable of managing (or unaware of) things such as this.

I can almost guarantee you its the bean counter's prior actions in chasing the cheapest IT solution in order to line the pockets of those at the top that has led to this mess.

18
1
Anonymous Coward

I gave up on electronic calendars years ago and have reverted to a wall planner in the office. I haven't missed an SSL certificate expiry date since. Bloody technology.

13
0
Silver badge

"If the beancounters can get something done by a certain date, why can't the IT monkeys?"

One of the things that the beancounters get done by a certain date is to outsource the IT monkeys who had their calendars sorted. And when the IT monkeys get outsourced are they really going to tell the beancounters "by the way, you need to keep an eye on this."? At some point beancounters get to discover that the IT people they outsourced weren't monkeys but there's a distinct possibility the outsourcers were - or maybe they were snake-oil salesman.

12
0
Anonymous Coward

Except, and I have it on good authority, that this cert was hard coded with no access to it. The only option was to update the software.

0
0
Anonymous Coward

Beancounters cant

At least in my case they didnt.

Retired earlier this year but they still kept paying me cor three months.

Very nice but I had to give it back in the end.

0
0

due to said pesky bean-counters taking away ALL the beans.

1
0

I wonder why networks don't have a roaming agreement in place for such catastrophic events. The events yesterday would have cost o2 £millions in bad publicity, yet if there was the ability for them to allow customers to temporarily roam to say EE or Vodafone this could all be avoided.

2
12

The article explains why this is a bad idea - sudden influx of customers onto another network might bring that network down too, causing a cascade effect.

10
3

These sorts of decisions are often made by marketing type teams, where the brand identity is worth 10x more than the damage from down-time. The decision to allow customers onto a competitors network as theirs broken? No way! Get ours fixed!!

20 or so years ago as a tech-support rep at Orange, a frequent issue at the time was SMS jamming. An easy fix was browsing for another network in the phone settings, attempting to join it (which would fail), then just joining the Orange network again. Within a minute or so, the "stuck" SMS would start coming in. Marketing or some similar dept caught wind of the advice being given out - and said it was to stop.

No matter it worked 99% of the time, no matter there was no other fix available, no matter the customer was inconvenienced by it not working.. The sheer fright that another network's name would come up on the customers screen? Unthinkable!

14
0
Silver badge

The fix is not "failing to join the other network", it is more correctly "disconnecting and rejoining."

Similar connectivity faults exist even today and can often be cured by temporarily going into airplane/flight mode then back again to normal mode. Or even by switching off and on again but not recommended as boot times are getting ever longer because all the crap with which we fill up our phones.

13
1
Silver badge

"The article explains why this is a bad idea"

I wonder how many times this statement is going to have to be repeated.

16
0

Graceful reconnect...

...was a concept that went out of fashion over a decade ago.

The mere idea of software actually checking the status of its connection and then retrying, rechecking, disconnecting (cleanly!) and reconnecting before trying again has been deemed ancient cruft - programmers have become too used to reliable always-on connections and never experienced firewall timeouts or line noise causing a modem to hang up.

$Deity, I feel old.

16
0
Silver badge

Or even by switching off and on again but not recommended as boot times are getting ever longer because all the crap with which we fill up our phones.

One of the things I've noticed about my current smartphone is it boots quicker than the one it replaced (both were mid-high end compact models), and probably about as fast as the feature phone I had before that. Brands omitted in case anyone thinks the data point is just shilling...

...although the 3310 was obviously quicker than any of them ;)

7
0

Also, but not mentioned in the article....

If the user was "roaming" the traffic goes back to the home network, as does the check to allow roaming, voice access, etc etc and SMS.. so as O2 was not suffering "no signal" that would help in zero way.

3
0

"The fix is not "failing to join the other network", it is more correctly "disconnecting and rejoining.""

Might be being a little over fussy there, Joe. The post said accurately the steps given to customers.. I think it's realised by all what those steps achieved (the disconnect/reconnect)!

1
0

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2018