Private road operator Transurban says a faulty network switch was to blame for a computer fault that crippled Melbourne’s traffic network early in October. The 12-hour shutdown of two key tunnels in Melbourne, the Burnley and Domain tunnels, caused 20 km worth of gridlock. At the time, the company blamed a computer glitch. The …
Planning for failure
Where is the box next to the tunnels which allow a policeman to set the signals with a big "override" button?
Re: Planning for failure
Fail yourself. If you read the article, you'll see: "The hardware failure at 4.30 am shut down the company’s safety systems, and at 5.30 am, the company decided to close the tunnels. The affected safety systems include emergency response, radio transmission systems, signage, smoke exhaust and water deluge."
Running a tunnel without safety systems like these would be gross negligence. Even if running traffic through in escorted convoy, there could be a nose-to-tail causing a fire. Without smoke exhaust and water deluge, let alone emergency response, that could be catastrophic. Need I explain it more?
Lots of single points of failure
There's a step-change between the old way of 'something breaks which passively prevents something' and 'something breaks which actively blocks everything'.
In the olden days if you ran out of petrol it was grief to you. Now lanes on the motorway are shut... congestion (and possibly accidents) occurs... then there are other knock-on effects.
Incompetence of the most crass variety.
At the very worst, they should have had provision to take vehicles through in convoy, led and tailed by emergency vehicles.
Just how regularly was the operation of their backup systems tested?
Just how old was the equipment which failed?
Paris, because even she's not quite that dumb
Actually sounds like a layer 2 design
They talk about a "broadcast storm", but it sounds more like it was a faulty switch sending continuous spanning-tree updates.
10 years ago, had the same thing happen on the last major Layer 2 campus I did. Same situation, down to one faulty switch that needed to be disconnected to stabilise the network, then replaced. Don't ever believe the error detection and update suppression mechanisms work. They don't.
Put me right off, and I only design layer 3 environments now, far more stable and reliable.