back to article BT customers hit by broadband outage ... again

BT customers in the UK are once again banging their heads against their keyboards this morning: a power outage has thrown them offline for the second day running. Today the issue is a power outage at Telehouse North in London. An email message from BT Wholesale, with the subject line 'Major Service Interruption' – seen by The …

So what happends to the £18 standing charge I pay BT / plusnet?

0
1
Silver badge

You keep paying it.

5
0

My fault apparently

Had issues all morning with our IP Phones, so inevitably it's my fault that they don't work. As soon as I mention what's happened it's "call BT now and find out when it'll be fixed, we can't work without phones". Except it's not actually a BT issue and we aren't a BT customer? Now if only I could have changed jobs 2 weeks earlier... (still here for another 2 weeks)

6
0
Anonymous Coward

Re: My fault apparently

Yes, another morning of a dozen service managers asking inane questions

have you checked the firewalls?

what applications are affected?

have you reported it to BT?

is this why Henry's PC isn't working?

why does the Internet affect VPN users?

when will it be fixed?

are we there yet?

Two bloody mornings in a row!

8
0

Re: My fault apparently

Are you my replacement?

7
0

Re: My fault apparently

You wouldn't believe* how often I have to explain to people in my office the HSBC, Lloyds, etc don't like giving me access to fix their online banking.

* Actually, I imagine you can...

3
0

186K

Anyone else with 186K?

All our customers using 186K went offline this morning, as did their website (www.186k.co.uk), and partner portal (http://www.dolphinmp.co.uk) and their partner support number (08701 222186), main number (08701 222 186), and support number (0872 232 1999) are all off.

Looks like they’re knocked 100% offline: At this point, don’t know it’s ISP Issues, or if they’ve folded

0
0

Re: 186K

Not folded: They've just got onto twitter for the first time since 2010 to update us: Looks like the issue at Telehouse North will take "Several Hours" to resolve.

https://twitter.com/186k

0
0

Re: 186K

It's the Telehouse issue. Complete power outage in several suites is my understanding of it. I have several servers hosted with 186k, all currently off-line, it's also taken down their VoIP phone system. Their own infrastructure is resilient (two entirely independent routes out with failover) but core network is dependent on Telehouse, which of course is supposed to be resilient itself. When Telehouse get the power back on, everything should come back. Problem must be major for it to be taking this long to fix, evidently more than just some tripped breakers.

1
0

Whatever happened to the mantra, "NO Single Point Of Failure".

3
0
Coat

@carrynot

But with the cloud, it's a highly distributed... single point of failure.

5
0
Bronze badge
Facepalm

Indeed, isn't critical infrastructure supposed to have fail-over to another site so that one site failing is annoying, maybe with some speed reduction, rather than stopping service completely...

0
0
Silver badge

No single point of failure

This is one of the most exquisitely painful, sensitive or (if you incline to BOFH humour) uproariously funny things in IT. Human beings think they can do these things, but they can't.

0
0
Silver badge

" isn't critical infrastructure supposed to have fail-over to another site so that one site failing is annoying, maybe with some speed reduction, rather than stopping service completely..."

Depends on the definition of 'critical' being used; which is different from industry to industry and application to application. Remember 5 9's doesn't mean the infrastructure will never fail, just that it is not expected fail very often. That is why when designing really critical infrastructure i.e people could die if it fails, you actually have to design in failure actions, so that it can "fail safe".

0
0
Anonymous Coward

"DNS Issues"

Anyone else get sick of hearing people with half a tech brain claiming these things are DNS issues every time? It's often the same bunch who'll say a defrag of the hard drive will resolve an Outlook-not- loading issue. Unfortunately as DNS is usually the first link in the chain to getting out anywhere, these people always think it's DNS causing the problem.

Call from a customer this morning who was suffering with this tells me it was definitely DNS, as he was trying to ping 8.8.8.8 and it didn't work. Fell on deaf ears that DNS won't work if there's some higher connectivity issue, groan :-(

4
0

Re: "DNS Issues"

I Dunnow man - I normally find "If In Doubt, blame DNS" A pretty good first troubleshooting step in Internet/Active Directory users.

0
0
Headmaster

Re: "DNS Issues"

The defrag is to mitigate one particular problem, which is to keep the user occupied while we figure out WTF is actually wrong.

It's not always defrag, depending on the user (or official answers card) but there's any number of diversions available that have at least some marginal benefit so if anyone asks we can talk about it being a sensible precautionary thing. And for the most part the relatively few people who do spot it for what it is will generally understand and even appreciate the semi-conspiratorial admission that they are right. Occasionally works as good PR but depends on how well you handle deviations from the official flow chart, if you are unlucky enough to have one.

Teech cos I is lecturin a bit, sorry...

0
0
Silver badge

Re: "DNS Issues"

Can't you refer them back to Murphy's second law of electronics ?

- It works better when its turned on.

The first law is that it works better when its plugged in

0
0
Silver badge

Re: "DNS Issues"

You're preaching to the choir here, AC...

Router down - "DNS is broken"

Remote site is down - "DNS is broken"

FOTT (Firewall On Too Tight) - "DNS is broken"

Chain a bunch of CNAMEs together and remove one in the middle - "DNS is broken"

Mail server is blacklisted - "DNS is broken"

Users' favourite cat picture site breaks their database - "DNS is broken"

...and so on. Sadly, routinely arming DNS engineers is not an option. It's not so much a health and safety issue, the cost of ammunition would cripple the business.

0
0

something doesn't make sense

power to kit in a DC should be dual fed, backup generators etc, so every power outage in such a DC should not cause any disruption. that said it clearly does, because it seems that still, a DC operator saying they have full power resilience does not t in fact mean what says on the tin. which ten leads to the next oddity....

complete loss of power to a major core node in a tier1 network shouldn't cause any disruption to anyone out on the edges. traffic should simply reroute around it to alternative nodes. pretty much the definition of a tier1.

if they've lost power to devices at the very edge then those are not necessarily backed up by alternates (further out from the core you go, the harder resilience gets ) but the flip side is the further towards the edge the device, the less people will go through any one device so the less people are affecedt. If the losses are to somewhere between outer edge and core, one of various layers of aggregation, then again, there will be resilence in the design.

BT Adastral@martlesham has some very smart people who have written a lot of the books on carrier grade network design, so I fully expect them to have done their resilience design properly... which is why something doesn't add up...

5
1
Anonymous Coward

Re: something doesn't make sense

> ... a DC operator saying they have full power resilience does not t in fact mean what says on the tin

I can imagine they've fell into a common trap - but this is pure speculation.

Say you have two independent supplies, UPSs, gennys, etc. And all your customers take two supplies, and the load is nicely balanced between them, and you've got "dual resilient" supplies.

Great.

Time goes on, and loads keep going up. In particular, manglement doesn't see any issue when loads reach 50% of capacity. Then there's a problem - one supply trips, all the loads dumps itself onto the other, and what was running at (say) 60% is now trying to run at 120%. Ah, so not quite the fault tolerance they thought they had.

It's much the same with data links. Resilience, redundant paths, automatic re-routing, yada yada yada. Then said power problem takes out a major node, traffic gets re-routed, and links that were running at (say) 75% are now running at ... well I guess well over 100% seeing as I could see 80+% packet loss at one point past a specific node in BT's network en-route to one of their (and our) customers.

It wasn't long before I saw a change in route, and that particular customer was back online. But we've others with 186k and they are only slowly crawling back onto the net. One "got a connection" but it was one of the 172.something addresses BT give out when an ADSL login fails. Forced the link down and it kept on trying until about 1/2 hour ago when it got connected. Actual connectivity over it is still "intermittent" though.

SO that's a second day wasted on dealing with the fallout from this sort of crap. Of course, our customers assume that it's a problem with our services and I have to keep explaining that we are fine - it's just like having an accident close a busy motorway, even those not using that motorway will be affected as all the other roads clog up with traffic.

3
0

Re: something doesn't make sense

There are two issues here - firstly there are very few facilities in the Docklands kitted out to a 2n (i.e. having two sets of everything) spec, most are just n+1 (so e.g. if you need 2 UPS units to cover the load, you'll have 3 so can handle one failing). Now n+1 is fine, until a problem either downstream of your redundancy (e.g. a circuit breaker) fails, or something fails in a way your redundancy doesn't expect (e.g. your failed UPS shorting the common bus). With 2n you are in general able to avoid this, as each rack has two supplies fed independently from the grid onwards (the really good ones even have separate substations), but it costs more, and most of the older facilities where the majority of carriers you want to connect to are present in don't have the space etc to actually become 2n.

The second issue is that all the redundancy in the world doesn't help in some situations - e.g. if you have a fire that somehow your extinguishing system can't manage to deal with, the first thing the fire brigade are going to say when they turn up on site is "OK, turn the power off". To a lesser extent you've also got the issue that a faulty bit of kit could trip both supplies, though good design of the breakers and distribution should be able to limit that e.g. to a single rack being affected.

3
0
Anonymous Coward

Re: something doesn't make sense

>BT Adastral@martlesham has some very smart people who have written a lot of the books on carrier >grade network design, so I fully expect them to have done their resilience design properly... which is why >something doesn't add up...

What doesn't "add up" is your understanding of BT.

Adastral is a retirement home for smart people. Smart people who are two smart to resign themselves to being a full-time academic, and not cool enough for the likes of Facebook or Google.

As BT have proved to the outside world, time after time, BT is a highly commercial organisation, every single one of its decisions has a strong commercial undertone.

"strong commercial undertone" being a polite way of saying how can we implement this most "cost-effectively" in order to achieve maximum value for our shareholders.

Just look at the way they've implemented 21CN, just look at the recent Parliament report, the writing's on the wall, you just need to take off your rose-tinted spectacles.

1
1
Silver badge

Re: something doesn't make sense

And when you carefully specify (and pay through the nose for) redundant power or network cables, you have to stand over the contractors and make sure they don't put both the "redundant" cables into the same conduit to save time and money.

Just one of the hundreds of nasty little things that can break redundant systems.

1
0

Re: something doesn't make sense

2N is not two power supplies. All tier 3 data centres have dual power supplies, each one is usually N+1.

2N is when you have 2 of everything on EACH power supply - twice the UPS, twice the number of generators. What I don't understand about this BT outage is if both their power feeds to the rack failed, why didn't another node pick up the load in a standard failover mode.

0
0

This post has been deleted by its author

Silver badge

Re: Non specific point

I'm going back to TalkTalk when my year is up.

Nurse... nurse... this man needs help.

I'm sorry sir there's nothing we can do; he's too far gone. Try exorcism.

4
0
Silver badge

Re: Non specific point

The words

Frying Pan

and

Fire

come to mind.

If you do go back, just don't come back here and complain when it goes badly.

1
0

Zen down as well

I can confirm that Zen internet are affected by this. My router connects and authenticates but seems to get the IP address of a BT Openreach test network, every URL I open shows the same 'Diagnostic network, page. Trying to work via 1 bar of 4G signal is not much fun (I work from home in a rural village), I think I will write today off.

0
0

Re: Zen down as well

My Zen-in-the-middle-of-nowhere is UP.....

0
0

Re: Zen down as well

Be glad that you are OK, I am on their Unlimited Fibre service with static IP aadress, so I guess that connects via Openreach through London somewhere.

0
0

It's just like Bungie's Destiny servers being down when you're playing a serious session.

Very slow connection today

I'm more pissed off with BT's crappy Spam filters that are not worth SH1T

0
0
Anonymous Coward

Great Britan not so great anymore when it comes to Internet

I have just spent 2 weeks in Croatia in a Villa in the hills 15 min's away from shops and we had internet speed of 20mb & 2.8MB!

At home one mile from exchange we're getting 17 MB & 1.5?

I know it's easier to put new equipment rather than using the old OpenReach network but they really do need to start looking at the big picture.

Tony

1
1

BT joined up systems

I phoned BT yesterday morning as my phone which has a lot of interference normally had become worse, almost unusable, and I assumed this was causing my broadband problems. After approx. 30mins with the BT call centre from 10:00 to 10:30 ish they agreed to monitor my line. No mention of the actual problem with the service just the usual nonsense master socket check etc.

Do they engineers inform the call centres etc, apparently not!

1
0
Silver badge

Re: BT joined up systems

This is related to Brownridge's Law:

"The quicker a phone's answered in sales, the slower it's answered in customer services".

This is by no means accidental. Huge corporations like BT can afford to treat customer service in much the same way that the opposing sides treated defence during the First World War. Multiple lines of trenches, machine guns, barbed wire, massive artillery bombardments, and if felt necessary poison gas. Anything to slow the bastards down!

The harder it is to get service, and the slower and more unwillingly it is carried out, the less money the corporation has to spend on it year by year. If (and perhaps only if) you have something close to a monopoly, you can make a very great deal of money that way.

0
0
Anonymous Coward

In Telehouse's Defense

Telehouse North is getting on a bit and there was talk long ago of gutting it at 25 years (2015).

BT is a major stakeholder and a major offender for power overuse. Generally clients routinely flaut the rules on consumption, filling racks to capacity with zero spacing, overflowing equipment onto empty footprints or connecting power cables to outlets several positions from their own. Rule of thumb is the bigger the client, the worse they behave and the more likely they'll be allowed to continue.

There are some pretty frightening sights under the floor tiles, you might even find the odd bit of contraband kit quietly running under there.

Account management generally tut tut tut and do nothing past wagging a finger whilst facilities have to keep the place in service. Cooling the building has always been a major challenge because of the above.

My bet would be that recent high temperatures have caused a major piece of kit to fail and it's taken out other areas nearby.

2
0
(Written by Reg staff) Gold badge

Re: In Telehouse's Defense

FYI

0
0

Telephone exchanges

Question.

Once apon a time ago, we all used land line phones,

No I'm not thinking of going back to them,

but did the phone exchanges of old suffer these sorts of power outages ?

I could have rose tinted glasses , but I can' t remember not having

a ringing tone when I needed it.

during a power cut, flood or what ever.

Whats changed to our ability to do redundancy or is it my memory ?

0
0
Silver badge

Re: Telephone exchanges

Exchanges didn't suffer power outages for the following reasons

1) Everything was DC powered

2) There were sodding great Batteries in the Exchange that kept everything working when the power went down.

Back in the 1990's we supplied lots of kit to go in exchanges. It was all 48V DC powered.

Then apparently OFCOM ruled that this wasn't needed any longer. Probably some arm twisting by the BT Bean Counters.

Datacentre rues then applied.

Batteries were thrown away and the rest is history.

3
0
Silver badge

Google wouldn't let me log into Google Groups today as it thought somebody had stolen my details and was trying to log on from Brum while I was still logged in via somewhere near Scarborough.

0
0
Silver badge

Google and Point of presence

is a right royal pain in the but.

Haven't the wizzards at the chocolate factory heard on Mobile Data? You know, you move around and use the services people like Google supply.

One day I could be in London. later that same day I could be in Los Angeles or in Kirkwall.

Any service that enforces this sort of geo-locking is really crap (just my humble opinion and not worth the effort it has taken to write this post). Their systems should be designed to cater for this.

0
0
Silver badge

What's the betting the circuit has been overloaded by the increased consumption of the air conditioning systems (office & rack)?

1
0

The ACs are not powered by the UPS in a datacenter, only by the generator.

DC UPSes are meant for computer/network gear only and will become very unhappy if you try to power ACs from them - chances are the ACs would get pissed off as well. No need to try in the first place since you can do perfectly fine without cooling during the few minutes it takes for the generators to start.

0
0
Megaphone

Your MP is an idiot...

Your MP is an idiot so won't be able to understand the nuances of this double failure and how it is significant in terms of national infrastructure (and national security) and economic prosperity. The only way he will care enough to ask questions in the house is if you use your considerable specialist knowledge to write to him or her and explain how this sort of thing really shouldn't be able to happen. If two key sites can go offline so easily imagine if sites were taken offline as part of a coordinated attack to coincide with some activity that caused major economic harm to the UK - it would make Brexit look like child's play.

The only way to get your representatives in Government to investigate this is to show them you care and why they should care too.

0
0
Anonymous Coward

Your MP is an idiot...

Your MP is an idiot so won't be able to understand the nuances of this double failure and how it is significant in terms of national infrastructure (and national security) and economic prosperity. The only way he will care enough to ask questions in the house is if you use your considerable specialist knowledge to write to him or her and explain how this sort of thing really shouldn't be able to happen. If two key sites can go offline so easily imagine if sites were taken offline as part of a coordinated attack to coincide with some activity that caused major economic harm to the UK - it would make Brexit look like child's play.

The only way to get your representatives in Government to investigate this is to show them you care and why they should care too.

0
0

Re: Your MP is an idiot...

Last night the announcement on BT phone line 0800 169 0199 was "we are improving the broadband network". What on earth does it mean?

0
0
Anonymous Coward

Re: Your MP is an idiot...

"if you use your considerable specialist knowledge to write to him or her and explain how this sort of thing really shouldn't be able to happen."

I wrote to my MP using my "considerable specialist knowledge" to express concerns in relation to a technology subject. He didn't even deign me with a reply, of any description, not even boilerplate acknowledgement signed pp by an assistant ... nothing !

MPs are a useless waste of space, particular the career-politicians who have absolutely no clue what the real-world looks like as they've only ever known Westminster.

0
0

Virgin Media

Is the current extended downtime for VM at all related? It's been up and down for a few days but since BT went TITSUP it's just been down.

1
0

Re: Virgin Media

well if you tell me your area, I could check... VM is still very good in TW postcode.. :)

0
0

this was the notice

Fault troubleshooter maintenance - 20th July 17:30 - 21st July 10:00

This was the notice, faults troubleshooter err faults well yes, not related, who knows, maybe somebody dropped a spanner, just maybe.

0
0

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Forums

Biting the hand that feeds IT © 1998–2018