A hardware failure downed thousands of punters connected to BT's network via some of its smaller Openreach providers on Wednesday evening. Reports via Plusnet, AAISP.net and others suggested that the outage affected many of their customers for around half a day. Initially a major fibre cut near Slough was blamed, but it was …
Surely the network should be resilient
IDNet went out as well...
What I fail to understand is how one failure caused a complete loss of service for some customers. Surely any well-designed network would be resilient for a least a single point of failure.
A well designed network..
..is not what we have. What we have is a cut-price minimum resiliency network. That's all the market is prepared to pay for. With Ofcom now looking at cutting BTw's charges even further it's unlikely we'll see any resiliency investment in their wholesale products.
I guess it all comes down to cost - if you want resilience it will cost you more. I'd be surprised if many consumer ISPs pay for anything more than bog standard - end customers want the lowest price possible and the ISPs generally have no penalties in their contracts with those customers for any failure.
A single point of fail.
Same question gets asked every time someone has an outage, and never noticed a definitive answer other than a faulty router config being sent out to lots of routers.
The same mistake time and again
"What I fail to understand is how one failure caused a complete loss of service for some customers"
For *some* customers. i.e. the entire network didn't fail, like a good network is designed to do. That's not an indication of a single point of failure.
The same thing comes up time and time again whenever there is any sort of outage - there are levels of resilience. The rest of BTs network carried on as normal, the Internet carried on as normal. If you want a five or six 9's level of service, you need to be paying more than £9.99 a month for it.
Not limited to Slough/Reading Area
I lost my main internet connection (via A&A) yesterday just after 5.30pm. Got online via backup (phone broadband tethering), and checked AAISP status page, which had the Slough/Reading outage on it.
Came into work first thing today to find all was back to normal.
The interesting thing was that when I looked at the ADSL router last night, it showed as connected, but was being given a 172.16.x.x private IP instead of its usual A&A public IP.
Eclipse ISP Were affected
At 17.30 on Wed Evening, router reported no connection
Switched to backup BT Connection & all was fine
Same exchange 200m from our office
Didn't re-connect until 03.30 roughly
If it wern't for our redundant backup ADSL then I'd have had to spend the evening talking to the wife. PHEW
Strange that BT wern't affected by their own outage!
my teenage daughters had to endure FIVE HOURS without Facebook
There was much wailing and gnashing of teeth. My physical connection to the interweb was OK but connection to Plusnet failed due to 'CHAP authentication failure' between 5 and 10 p.m ish. I'm an RF guy so that's all greek to me.
I'm the kiss of death to ISPs; I moved to Pipex and they got bought by Tiscali, I move to Plusnet and they're bought by BT.
Reading based Wizards and ICUK customers also effected
I lost my internet connection at about 17:35, and it didn't return till after 03:00, some 10 hours later. The WAN light on my broadband router did come on a number of times during this period, but all email and web access failed despite appearing to have connectivity.
A friend in the Kew area, also with the same set of providers, had a similar length of outage. When I did return, and had IRC contact with A&A I was directed to their status page detailing what may, or may not, have happened. On a different IRC channel this evening I was directed to this TheReg page.
Like Steve (above), I looked at my router via its web interface and think I also had a 172.x.x.x address. I didn't make a note of it as I didn't / hadn't realised it wasn't the usual address.
I know it's complicated but
I know it's complicated but it would be helpful if Kelly's article showed some evidence of clue about the different roles played in this picture by the allegedly independent parts of BT.
BT Openreach: provider of connection between punter and BTwholesale kit in the exchange (or indeed other providers' kit in the exchange, but this picture relates to BTwholesale).
BT Wholesale: historically incompetent provider of connectivity between exchange kit and ISP kit, traditionally using badly designed and/or undersized networks and charging outrageous wholesale prices to "smaller" ISPs, whilst offering a different product family to parent BT Retail.
BT Retail: overpriced and underperforming broadband provider, default choice of the ill-informed.
BT Sheffield: Plusnet, a diversionary "smaller ISP" which allows BT to undercut other BTwholesale-based ISPs whilst still raking in the money from the ill-informed on BT-branded tariffs and while Ofcon stand idly by.
Certainly the quality of service is a good as it's ever been, at least as BT's
Remember this is BT
"Surely any well-designed network would be resilient for a least a single point of failure."
A well designed network would, yeah...
what a night for my first night shift!
The outage went as far as bristol on some networks and drown to brighton, we had customers all over the south whose private links went down with this outage.
Supposedly a faulty 21cn router in the docklands caused all this mess.
bt reported it fixed at 1.15am but it was flakey at best til 3.30
Around 8hours of downtime after all is said and done.
im starting to realise a&a were pretty much on the money with their previous comments about 21cn, they need to get SOME resilience in the network if one faulty router can cause problems as far wide from brighton to slough and bournemouth to bristol, thats surely one he'll of a design flaw?
Not just Slough
Our MPLS Point of Presence is located in Slough. This affected 30 sites across the UK all unable to connect to each other (despite some sites only being a few miles apart in the North-West).
BT's answer? "It's clearly a problem your end." LIARS!!
Plusnet customers in Trowbridge, Wiltshire affected
Our Plusnet ADSL connection was down from 5.30pm until at least midnight in Trowbridge, Wiltshire. We had an email from Plusnet a few months ago saying they were moving us onto the 21CN infrastructure. It sounds like this affected a lot of people.
I thought "BT" was synonymous with "downtime"...
If anyone wanted a reliable internet connection, I would have thought that BT would be *part* of the solution, not *the* solution. I'm a natural pessimist when it comes to estimating ISPs' ability to actually do what they say they're capable of.
Snake-oil salesmen, the lot of 'em...
AAISP had out of hours support?
Never did when I paid them the vast quantity of money they demand for an ADSL line
Neither does Timico..
..unless you pay for it. Shocked me when I found that out a couple of weeks ago. We pay a couple of hundred quid a month for a business connection but their support is only open during office hours and part of Saturday.
At home I have Be broadband and it has 24/7/365 support. Not only phone and email but IRC and a web based ticket system.
Some ISPs really do take the piss.
This is a title
Shouldn't that be 24/7/52 !??!!!!1111!!
If we're being really pedantic...
surely it should be 24x7 and not 24/7?
Plusnet keeping people in the loop
Like everyone else affected, my connection went down at about 5.30pm and came up somewhere around 3am (I don't know exactly when). Fortunately, there's a Plusnet app so I was able to get the updates during the evening about what was going on. iPhones are good for something after all.
I strongly suspect that the failing router hadn't actually popped its clogs and turned its little lights off. I think it had probably just lost its mind: getting the odd 172.16.0.0/16 private IP address from a public network is a sign of insanity rather than death :-)
- Tricked by satire? Get all your news from Facebook? You're in luck, dummy
- Feature TV transport tech, part 1: From server to sofa at the touch of a button
- Google straps on Jetpac: An app to find hipsters, women in foreign cities
- Updated Microsoft Azure goes TITSUP (Total Inability To Support Usual Performance)
- The Return of BSOD: Does ANYONE trust Microsoft patches?