BT was hit by a big power failure this morning at one of its major exchanges in the Midlands. Business broadband and Infinity customers as well as Easynet and other telcos have been downed by the outage over the past few hours. "We can confirm that, as a result of a power failure at one of our major exchanges, some customers …
On and Off
They must be outsourcing customer support from Virgin Media
After power outages at exchanges it is a general rule to reboot modems to ensured it resigns in with its authentication server as quite often the modem would have timed out trying to authenticate or will think its still authenticated when its data route is back.
Got that right...
Virgin similarly had a very major outage in East London from midnight last Tuesday right up until Sunday morning. I'm guessing though this might be getting quite normal for Virgin since it hardly broke news. BT on the other hand if it's down for an hour it makes the national headlines. Why is that?
Unlike BT, Virgin were unable to to find any suitable excuse for their outage, except that hundreds of customers found that the Superhub firmware had mysteriously been upgraded when everything came on Sunday morning (it now has a modem only mode). BTW when Virgin tech support were asked about possible firmware issues last week they completely denied there was any such issue.
Even today Virgin still refuse to say why broadband had been switched off for so many customers for so long, all the while their status page said nothing wrong most of the time and when they did admit something was wrong they kept revising the fix time by 24 hours each time.
My challenge to BT then: see if you can do better, and good luck. And do remember that customers like to be told the truth so that they in turn can manage their own customer's expectations.
And in related news
BT confirmed that Business Continuity is a nice to have and will be looking at getting some soon.
Seriously, how can a "power failure" take out half of the BT Broadband network? Single point of failure in a major fashion.
Adrian (O2 Broadband and loving it)
BT - "Bollocks Talking"
Dispite a warning of the implications of stupidity from series of Spooks and Security Experts telling BT that amalgamating all of the network-swtiching infrastructure (all 22 disparate systems) to a single system: BT's '21CN' - which is IP; the "network has gone done", still.
The entire reason for 21CN was to stop a single-point-of-failure to have any significant impact - a la thegreat Manchester fire of '05... And still, they are out.
"BT: Bollock-Talking" continually... Still, Jobs for the Boys anyone?
How can this be allowed to happen, in what is supposedly a high tier facility?? What were they playing at & have they been playing at (in terms of power reduncancy)??
Proves BT are muppets really...on the day we asked them for a Leased Line quote & it was triple another provider for less service - oh dear BT!
You'd be surprised.......
No matter how much you plan the fates can conspire against you.
I work for a huge international telco and last month we had a planned maintenance on one of our main sites where the power company took out the "A" feed (for an upgrade) which left the site on a separate feed from a different supplier. Yep, you guessed it, the second supplier had an unplanned outage. No problem, the generator kicked in (as planned) but for some reason (mainly because the network monitoring centre was already expecting a power outage) the second power alarm wasn't picked up. Now the generator is good for 24 hours, 2 x 12 hour fuel tanks and it duly ran for 12 hours. That's when a small component failed in the fuel pump switch unit leaving the whole site on battery backup. Now then you'd think the guys working there would have realised but this was over a weekend, only 1 guy "monitoring" the power provider, he wasn't even aware the generator had kicked in! 4 hours later the battery died and took out a large portion of our french network (not to mention all the transit traffic!) so you see, you can only plan so far, when the Brian smelly stuff hits the rotating wind shifter you are still in gh lap of the gods.
Triple Redundancy Anyone?
Maybe three feeds from different suppliers would have helped and running up the generator from time to time on-load helps to test out any issues as would maybe at least a second backup generator? I might be being a bit harsh but these things do happen.
That would have helped.....
To make the site economically unviable, do you have any idea how much a feed of that capacity from an electricity supplier actually costs? How much a generator of that size costs?
Oh, we spin the generators up on all our sites every week, check the batteries every month (yes by turning off the mains supplies). My original point still stands, you can only plan so much, there is no planning for every contingency!
What if the 3rd supply failed? What if the second generator failed at the same time? You going to ask why we didn't have a 4th supply? Third generator? I suggest you work for a LARGE telco and see exactly how much it costs to build and run a tier 1 site before asking these questions
A competent tech in the "alerts" group for the UPS would be cheaper.
What no generator !?
Thank god for Virgin
as a domestic Virgin customer, I was intensely pissed off when my company decided that BT should be my ISP, as a homeworker. Sure enough they managed to mess up everything from install dates, broadband activation dates, and billing.
Anyway, to be fair, once active, BB has been OK for a year ... but thank god for my Virgin connection. One click, and my laptop connected to that, and I was up within seconds.
To be fair, they would have considered Virgin, if Virgins "business" department had actually bothered replying to an email.
'Up within seconds'
Thank god for online pr0n' eh?
Not entirely relevant but...
I was on a plane once, going on holiday, but there were some technical problems and they had to switch off the plane and restart it - re-boot obviously. As a techno geek, I immediately thought; is the plane running Windows? And then of course to labour the point, you have to close down the window blinds before take-off and landing. Shutting down Windows.
you actually have to have the window blinds open for take off and landing, rather than close them.
That'll learn 'em to use such utter shite as BT
Some of us don't have any option. Our local exchange is still not LLU so if we want business broadband, we have to go with BT. The alternative would be a 2Mb leased line at horrible expense.
Service restored now, for me anyway.
I guess redundant paths are, well, redundant these days. How did we let these arseholes fuck up our internet like this?
Fuck me, our static IP address is now suddenly dynamic. Way to go, BT fuck-wits.
note that the message talking generically about broadband connection problems only appeared AFTER the problem was fixed <grrrr>
No UPS ? No generator..
Single point of failure?
Odd how the failure was in the Midlands and I lost by broadband here on the south coast, how small a hole are they trying to funnel this all through?
At least the current global warming meant I could go and lie on the beach while they fixed it.
Single point of Failure?
Isn't the internet supposed to be able to survive the odd nuclear blast?
Is this why Infinity kept going down today?
Not much detail there I am afraid
And as usual, the BT Business service status page showed absolutely nothing of relevance (4 local issues, nothing to indicate a national issue). The support phone-line just rang, not even the fun of getting into a queue. How frigging hard is it to put something on the status page to let people know there is a national issue and it's being dealt with? I didn't know for sure if the problem was at our end or at BT's until the story made it to the BBC news website.
Glad I'm not the only one. I spent most of the late morning / early afternoon trying to raise a ticket with BT, and couldn't get through on any of the publicised phone numbers, they either rang until they timed out, or were engaged.
My PHB kept glaring at me 'cos I hadn't fixed it yet, so it was a relief when the news finally broke and I could show him it wasn't my fault!
I asked for credit back on an "Issue that only affects 25% of our customers" (I'm in N London). Denied. The way they handled this was very, very bad - infrastructure fucked, no answer of the phone line, long queue when finally I got through and no mention of a serious outage, deceptive message on status page *after* the event, dynamic IP address instead of a static one (no, this service is not "Business").
I left VMB (who also don't have a clue) because they broke their contract with me by changing the traffic management policies mid-contract without notifying me. They're still sending me overdue reminders, FFS, some five months after I sent my cancellation notice. Now it seems the only remaining "Super-fast" ISP are also clueless, but this time, I have no excuse to leave. Ah well, if I can just get my MTU issues sorted out with a Cisco IOS-powered router (they don't do PMTUD properly either so I have to hack TCP MSS for IPv4 and 6) and get my O2 dongle unlocked so I can get cheap 3G cover on demand (because, obviously, business services continuity really aren't that critical) then I should be fine for the next two years.
Just been reconnected after almost 5 hours offline. Got to love BT..
When the service comes back on it is with a dymanic IP. Restart the router to revert to your static IP.
Redundancy doesn't have to be negative!
Tried to ring the helpdesk which is/was in India, but couldn't get through as the phones were down.
Posting as AC because I used to work for BT (sorry everyone, but it isn't my fault)
Sky & Virgin
Over the weekend I had two FAIL experiences with telcos.
I tried to call a friend who's on a Virgin land line from my O2 mobi: as is frequent, the connection failed to complete as a two-way conversation when they answer & their line appears to remain "off-hook" for some time afterwards.
Sky's voice & LLU broadband service failed across various areas in Northern Ireland. No dial-tone on outgoing call attempts, straight to voice mail on inbound and no RADIUS login on broadband for about 18hrs.
Not sticking up for BT the Sky experience certainly deserves a mention.
Not just BT...
The power outage affected 9,999 other customers in Birmingham too...
Power failure in Birmingham
The power failure in Birmingham 'only' affected about 10k customers*. It's unfortunate that one of those 10k was BT. What puzzles me is why a power failure at about 11.40am would cause a fault that seemed to manifest itself about 1pm.
* Being the centre of Brum some of the ten thousand customers represented quite a few people (e.g. Aston University is presumably one customer but rather more people).
"What puzzles me is why a power failure at about 11.40am would cause a fault that seemed to manifest itself about 1pm."
About an hour and 20 minutes of battery backup?
Severe problems at 08:15
My FTTC was down then was ready to call BT but had to go to work
Where's the power back-up? In my day, as a BT man (or Post Office Telephones, as it was then), main exchanges ran on 50 volts DC from a mains power unit and also had two of the biggest set of wet batteries you've ever seen in your life, each cell alone being approx. a 2 ft. cube with an awful lot of ampere hours available (forget the figures now). Even rural exchanges had a smaller version, which was meant to run them for 24 hours in the event of a mains failure. The big stuff also had a whacking great diesel generator cutting in automatically if the mains volts vanished, so how does a modern system manage to go off the air completely?
Even backup power systems fail you know. We had an out a while ago in our datacentre. Unfortunately a previously undiagnosed battery prevented the UPS from kicking in. On a 500KVA UPS so that's a lot of batteries and one battery caused it to fail, ho hum. So even though the genny kicked in there was a few seconds without power. Obviously most things power cycled and as is usually the case a small percentage of boxes failed to come straight back up.
Now obviously some people lost work - and some is a few hundred in a large organisation. A few systems took a while to get back online. One system lost a couple of hours' data.
Questions were asked by senior management, but as the infrastructure guys pointed out you can't plan for every failure. Sure the power systems could have been even more resilient, but guess what? When the datacentre was being planned the PM looked at a more resilient system and those on high said it was too costly.
Another site went out in the early hours of sunday morning. UPS and genny coped admirably. Power came back on after a couple of minutes and then surged big time, taking out quite a few breakers. No biggy the genny stayed up. However there was a problem. The management system sent out alerts for the power failure, and then stand downs when the power came back in. For some reason it failled to send alerts for the immediate failure. When the staff got in on monday the genny was down to the last few litres of fuel and starting to send low fuel warnings. Breakers reset, crisis averted.
Procedures were changed after that so that after any power failure at any time systems are checked even if the all clear is set. But it just goes to show there's always something and it's hard to test for every eventuality.
Where were the batteries?
Every exchange used to have a cellar full of 2 volt lead-acid cells in banks of 24 to provide a 48v backup supply until the standby generator kicked in. Nowadays the least you should expect is a UPS and a redundant power supply.
In answer to all those asking questions about back-up power. Each exchange has back-up generators and batteries. In my area they are preparing for NGA (Infinity) roll out and these exchanges have an additional external generator installed as a temporary measure (NGA requires more power) whilst waiting for permanent upgrades to the main generators. I have no idea why this area failed but it must have been a pretty serious problem.
There's a lot of the usual crap getting posted here. All the usual what about redundancy? etc. etc.
You want redundancy? You what business continuity? You want backup power? You pay for it. Simple as that. Don't expect your ISP to provide it FOC.
I mean did it say anywhere in your contract anything at all about business continuity? It didn't? Well mercy me.
"We are working to restore service to remaining customers as soon as possible this afternoon. Should any customers continue to experience difficulty in accessing their broadband service, they are advised to turn their hub or modem off and on again."
Tell them to f**k off! I work for a major international telecoms provider, we're spitting blue teeth at what has happened, our clients, use BT as an ISP and it's causing them and us massive problems. If someone tells me to turn my modem on and off, I'll throttle them!
Tell that to joe public, but don't you dare, ever say to me and my CISSP colleagues turn it off and on.
So much for battery back-up and generators.
So in all your CISSP experience you've never come across the need to re-authenticate, or tear down and rebuild connections, following a power outage?
Obviously still a relevant exam then
@AC Power questions
I'll tell you why it failed, disaster scenarios which people don't bother to test.
See it all the time. I could tell you some horror stories regarding national infrastructure which every one thinks will work smoothly when in a disaster recovery scenario has to be implemented. No f***ng chance. Guarantee that's when the shit really hits the fan. Just pray it doesn't happen.
How to survive a power cut.
"BT was hit by a big power failure this morning at one of its major exchanges in the Midlands".
Might I suggest connecting each rack to an UPS and have at least two back-up generators nearby in case of a power cut. Design the power grid such as you can swap-in-and-out equipment while maintaining power to the computers. Do remember to test such a system at least once a week.
Large outage on French Network
"I work for a huge international telco and last month .. a small component failed in the fuel pump switch .. and took out a large portion of our french network", AC
Funny, nothing in the news ...
Probably nothing in the French news about a BT outage in Brimingham either.
It's the result of a well known process called "not giving a toss".....
I have a friend who is still affected by this. They have been informed that a person had connected an "illegal device" to the network and had caused "several devices to blow".
Still not working
Still not working this morning...
Incredible how many "IT professionals" in this thread have never heard of any of the legitimate reasons to restart an ADSL modem after a network outage
As I'd lost the connection on my work laptop, I went to check it on the wife's PC. When I failed to find Google and closed the browser, the BT Hub help app popped up offering to fix the problem. So I thought, what the hell, and told it to try. So it ran some diagnostics, decided Broadband was ok and the problem was in the router. It said, follow the instructions that come up in the browser, and tried to connect to the BT website.
Who writes this stuff?
Still not working Wednesday morning
Spent a couple of hours on the phone to India^B^B^B^B^B BT Help line last night. They promised to call back Thursday evening... (really!)
- Top Gear Tigers and Bingo Boilers: Farewell then, Phones4U
- Breaking Fad 4K-ing excellent TV is on its way ... in its own sweet time, natch
- Updated iOS 8 Healthkit gets a bug SO Apple KILLS it. That's real healthcare!
- First Irish boy band U2. Now Apple pushes ANOTHER thing into iPhones, iPods, iPads
- Stephen Pie iPhone 6: Most exquisite MOBILE? NO, it's the Most Exquisite THING. EVER