back to article BT customers hit by broadband outage ... again

BT customers in the UK are once again banging their heads against their keyboards this morning: a power outage has thrown them offline for the second day running. Today the issue is a power outage at Telehouse North in London. An email message from BT Wholesale, with the subject line 'Major Service Interruption' – seen by The …

  1. David Knapman

    "It is ideal for telecom providers, financial trading exchanges, media and gaming companies that require speed, reliability and reach."

    Hmm. Note how they're careful not to include "power" in that list.

    1. lglethal Silver badge
      Joke

      I guess they can take reliability of their list now as well....

  2. Anonymous Coward
    Anonymous Coward

    Hello from Sweden

    Only I'm not actually in Sweden, I'm having to use a VPN connection because BT's acronym today is Buggered Telecom.

    1. Ragarath

      Re: Hello from Sweden

      So BT's network got you to a VPN via Sweden. Seems like it's working for you ;)

      1. Anonymous Coward
        Anonymous Coward

        Re: Hello from Sweden

        I wouldn't say completely working, I couldn't connect to the server in France.

  3. Pen-y-gors Silver badge

    Some sympathy -but not a lot

    Having recently had a 5-day outage at the bods who do some of my hosting (how often do both drives in a RAID setup fail at the same time? And with a dodgy backup?) and spent time placating customers for cock-up by third party, I feel a teensy-weensy bit of sympathy for BT when they are affected by a supplier's kit going titsup. (We're currently migrating all our customers to someone more reliable anyway - they really should have been more apologetic!)

    BUT - BT is not the same as a small One-man-and-a-cat rural welsh IT company specialising in bi-lingual web development. They're several times bigger, and should surely be able to afford truly redundant kit that just keeps working, although I will allow them a hiccup or two in the event of meteorite strikes or Brexit.

    1. Alister Silver badge

      Re: Some sympathy -but not a lot

      @pen-y-gors

      I think you've got the wrong end of the stick here, It's not BT's own infrastructure in Telehouse North, (despite the confusing name) it's part of the London Internet Exchange, which provides routing and connectivity to all the Tier2 providers, so BT are more in your position at the moment, of sitting waiting for their suppliers to sort their shit out.

      1. TrevorH

        Re: Some sympathy -but not a lot

        I don't believe this problem has anything to do with LINX. THN is a massive building and LINX have space there but the room currently affected by the power problems is not the LINX suite.

        1. Alister Silver badge

          Re: Some sympathy -but not a lot

          I don't believe this problem has anything to do with LINX. THN is a massive building and LINX have space there but the room currently affected by the power problems is not the LINX suite.

          Well we're having big problems with latency and packet loss from a Plus-Net fibre link to some of our servers at Firehosts/Armor, and most of the issues seem to be the Telia nodes in London:

          ldn-b3-link.telia.net

          ldn-bb2-link.telia.net

          ldn-b5-link.telia.net

          By comparison, we aren't seeing any issues with these hosts on the same routing:

          peer2-et-9-1-0.redbus.ukcore.bt.net

          peer5-te0-0-0-31.telehouse.ukcore.bt.net

          So maybe this is a routing issue from BT onwards?

      2. Anonymous Coward
        Anonymous Coward

        Re: Some sympathy -but not a lot

        I'm sorry, you're incorrect. Today's outage directly affected BT's suites, so their infrastructure lost power. It did not affect the LINX suites which are located elsewhere in the building.

        However they are waiting for their supplier - Telehouse needs to restore power to the affected areas before BT can bring their kit back up.

      3. patrickstar

        Re: Some sympathy -but not a lot

        LINX going down would at most lead to reduced capacity, not an outage (well, not for long, at least).

    2. dajames Silver badge

      Re: Some sympathy -but not a lot

      (how often do both drives in a RAID setup fail at the same time? And with a dodgy backup?)

      If two drives are from the same batch, and have spent their entire lives in the same RAID setup where they have experienced very similar duty cycles, thermal conditions, vibration, etc., then it's not entirely unlikely that they will have about the same lifetime.

      If one drive in a RAID array fails, the other(s) will experience atypically high use while the array is rebuilt, and the chances of a second drive failing during the rebuild are not insignificant. It does happen.

      The dodgy backup is another matter, though. There should always be more than one backup.

      1. Anonymous Coward
        Anonymous Coward

        Re: Some sympathy -but not a lot

        I've always wondered why there isn't business practice to replace half the drives in RAID 10 and move them over to a fresh server with more brand new drives. Then you get half your drives (make sure each is one half of a mirrored pair) with a remaining MTBF to the others and therefore reduce the chances of dual failure.

        Doing this could allow you to use cheap low endurance SSDs rather than high end drives (HP Drives High Endurance 800GB SSD = £2700 ex VAT each!) with less overall risk even if it does require more swapping over the years. If you have spare room in the chassis to keep hot spare then so much the better.

        1. Captain Scarlet Silver badge

          Re: Some sympathy -but not a lot

          @AC Replacing drives can have some effect on the systems running on them and managers hate people whinging the systems always slow.

          1. Anonymous Coward
            Anonymous Coward

            Re: Some sympathy -but not a lot

            Replacing a mirrored SSD doesn't affect performance very much in testing. Losing a RAID system and having to restore from backup can annoy users a lot more. If you have a redundant system then you can switch your VMs over to your other systems while you run the disk replacement.

            In any case you are only looking to switch the drives out every 18~30 months (depending on MTBF/2). If it allowed you to buy SSDs whereas previously you would have had Hard Drives instead then the speed increase will be major anyway and the volume during a RAID Mirror rebuild would be faster than the normal operation of non-SSDs in a VM environment anyway.

      2. AndrueC Silver badge
        Facepalm

        Re: Some sympathy -but not a lot

        Back when I was a data recovery engineer we saw drives from the same batch failing quite a few times. We also saw failed controllers since most RAID boxes only have one controller. But the most common problem was user error due to one of the following:

        * Ignoring first drive failure and continuing to run the system until another went. A RAID is not a form of backup. It's a 'get me home' solution.

        * Poorly written software that didn't make it clear how to rebuild an array and didn't have adequate safeguards to protect the array integrity when adding a new drive.

        * Incompetence.

        1. Anonymous Coward
          Anonymous Coward

          Re: Some sympathy -but not a lot

          Upvote, and allow me to repeat that Very Important Statement once more because it cannot be said often enough:

          A RAID is not a form of backup. It's a 'get me home' solution

          Thank you.

        2. PNGuinn Silver badge
          Mushroom

          "A RAID is not a form of backup. It's a 'get me home' solution."

          And how many LARTs do we need to get that simple fact through to pointy headed management and even pointer headed bean counter??

          GRRRR ...

    3. Anonymous Coward
      Anonymous Coward

      Re: Some sympathy -but not a lot

      Unfortunately the majority are only happy if they're paying a fiver a month or something silly for Internet.. You'll hear them at the water cooler saying how they only pay peanuts, and if one of the big providers don't do it cheap enough they'll move elsewhere.

      Unfortunately, all these fivers a month don't add up to a resilient system. It's not enough for every link in the chain to be as reliable as it could be - yes the companies involved make some profit, and yes, probably a lot of it - but that's what businesses exist for.

      They can't make profit AND provide 100% resilience on a shoe string. Something somewhere will suffer. So next time you hear someone boasting they pay a fiver for Internet/calls/Sky, and the next day whinging it was off for a few hours, remind them on that fibre across oceans and satellites in space doesn't come cheap. It's surprising it works at all.

      1. Anonymous Coward
        Anonymous Coward

        Re: Some sympathy -but not a lot

        "Unfortunately, all these fivers a month don't add up to a resilient system. It's not enough for every link in the chain to be as reliable as it could be..."

        And yet we pay £30,000 per year to BT for a single** BTNet line and are having the same issues. So how much should we be paying to ensure that we have some resilience in the network?

        **Before you comment, there is no option for second line without £100,000+ in groundworks which would still have also been affected by this problem.

        1. Anonymous Coward
          Anonymous Coward

          Re: Some sympathy -but not a lot

          Ahh, because you're paying £30k, you expect that'll cover resilience everywhere. Nope - you're just subsidising everyone paying their fivers a month.

          Unfortunately even your £30k/y doesn't buy resilience in a national network - it would barely cover the wage of one "engineer" who climbs poles all day! It wouldn't come close to running you your own separate little bit of pipe all the way back to a data centre and beyond, so you have to partially share infrastructure with the masses.

          Like it or not - the market has spoken - and it decided it wanted cheap internet that sometimes goes off now and again. And if TalkTalk don't supply it, they'll go to Sky. And if they put their prices up 50p, they'll move on again. Businesses having no option on what they'll pay have to delve deep in their pockets to keep the whole affair ticking over.

          You might not like that, but that's the way it is - the proof in the pudding is your Internet being off two days on the trot. If you don't like that for the money you're paying, don't pay it.. and encourage all the cheapskates who're the root cause of the problem to pay a bit more for their Internet!

          1. Archtech Silver badge

            Re: Some sympathy -but not a lot

            "Like it or not - the market has spoken - and it decided it wanted cheap internet that sometimes goes off now and again".

            In very much the same way as, 30 years ago, the market spoke and decided that it wanted cheap software that sometimes has to be rebooted and that has little integrity and no security. That's what made Microsoft and Bill Gates the world leaders they are now.

        2. Blackfriar

          Re: Some sympathy -but not a lot

          Sounds like the person paying £30k/annum has a big internet pipe or is based somewhere remote and expensive. Do you host servers at the site or just need it for internet and possibly voice comms?

          Have you considered a mobile (4G if available) or bonded multiple DSL service as a backup? These don't have to go via BT's core even if delivered locally by BT copper lines for the DSL.

      2. Anonymous Coward
        Anonymous Coward

        Re: Some sympathy -but not a lot

        Where can I get this broadband at £5 a month?

      3. Soundman

        Re: Some sympathy -but not a lot

        Telehouse is expensive - plus it has the best infastructure of any. I worked there for three years +

        I assume a single trip means just one of their UPS power cabinets dropped off. We can only guess the cause at this time. A spot of load redistribution and a reset is a very quick fix. The knock on of restarting an entire network is rather more lengthy.

        1. Anonymous Coward
          Anonymous Coward

          Re: Some sympathy -but not a lot

          You can't have looked hard - One of the suppliers was mentioned in the post - TalkTalk.

          Their website currently advertises FREE broadband for 18 months (after 17.70 line rental). Pretty sure my line rental with KCOM is about the £14 mark, so even if TT's line rental is slightly above average, it's still about a fiver for the broadband.

          Their retention offers are better again - my grandfather pays peanuts for all his TalkTalk goodies.

          The FREE broadband is interesting though, as to my knowledge BT/Telehouse etc are unlikely to offer all they do for free, so something somewhere is being skimped on. (It's kind of how eBay encourage you to post your items for free, but I've not yet found a Post Office that'll do that for me). There are these offers all over the place, then we're sat here asking where the redundancy is, LOL!

    4. SImon Hobson Silver badge

      Re: Some sympathy -but not a lot

      > BT is not the same as a small One-man-and-a-cat rural welsh IT company

      Indeed.

      Us small fish just have to accept what conditions are on offer. BT are big enough that they should be able tocan dictate to suppliers how things are done. It may be a supplier's problem, but BT can't hide behind that because either they've audited the setup and were happy with it (oops), or they didn't audit it in which case they can't be said to have done due diligence (oops).

      Either way, from a PR PoV it's BT's name in the headlines.

    5. John Sanders
      Windows

      Re: Some sympathy -but not a lot

      I would not go with a very large company unless they have something I strictly require.

      There are plenty of competent small-medium sized ISPs in the UK.

      Just my 2 pence.

      1. Archtech Silver badge

        Re: Some sympathy -but not a lot

        Which is why I went with fast.co.uk (Dark Group) back in 2007. I have found them eminently satisfactory ever since. But I was among those without any Internet connection this morning, because unfortunately fast.co.uk were dependent on the kit up in Telehouse North. Even their phones weren't working!

    6. Vince

      @Pen-y-gors Re: Some sympathy -but not a lot

      "Having recently had a 5-day outage at the bods who do some of my hosting (how often do both drives in a RAID setup fail at the same time?"

      When they're from the same batch, same make, same model, same usage pattern (RAID-1 would suggest exactly that by design)...

      ...quite often actually. One of the (many) reasons we mix drive models/batches/makes in RAID arrays

  4. GreggS

    Nothing to do

    with the recent Government report telling BT to get their house in order???

    1. Doctor_Wibble
      Black Helicopters

      Re: Nothing to do

      > ...the recent Government report ...

      No, it's because the Snoopers Charter is back in the works and the black boxes are being installed. Every major outage happens around the same time as the charter reappears, and always in a different place because obviously you don't install the things in the same place twice.

      Check the records, you'll see I'm right!

      Black helicopter as there's no point being AC any more because They know, They always know, as though They were right in here with me in my secret location under the stairs.

      Oh no, They are back! How do They keep finding me?!?!?

      1. David Gosnell

        Re: Nothing to do

        Curiously hot on the heels of a recent mysteriously kept-under-wraps issue at Plusnet, where despite their insistence they don't intercept or anything of the ilk, connections to third-party SMTP servers were timing out – but only when messages had an attachment, no matter how tiny. Very, very odd, indeed pretty much inexplicable without foul play involved, especially with supposedly encrypted connections. They tried to blame it on some other ongoing DNS issues, but DNS doesn't care whether email has attachments or not...

        1. Alister Silver badge

          Re: Nothing to do

          @ David Gosnell

          I've come across a similar SMTP issue where the MTU is set to Max (1500) and the do-not-fragment bit is set on SMTP packets, it causes multiple retries and timeouts.

          1. David Gosnell

            Re: Nothing to do

            Hmm, certainly possible. Although the issue occurred with even trivially small attachments, I'm not sure I tested it with anything so small as to be under the MTU.

      2. Anonymous Coward
        Anonymous Coward

        Re: Nothing to do

        There have been 'anonymous' boxes at Telehouse for at least 10 years. Why would installing them affect the power?

  5. chris 17

    @ Pen-y-gors

    "how often do both drives in a RAID setup fail at the same time? And with a dodgy backup?"

    if its that important then you need to pay more for a more resilient solution, perhaps active / standby cluster across geographically separated data centres provided by different vendors on different ISP's and supplied by different power stations. what's the cost to you of the outage vs the cost of a resilient solution?

    regarding BT, yes this latest outage should not have the impact on its network that it has. Services should have instantly rerouted with little to no impact on end users.

  6. Ellis Birt 1

    It's about money!

    Both 'outages' have been the loss of part of their diverse network infrastructure.

    They have highlighted that while their network is diverse, there is not enough bandwidth when one major location goes down to handle all their peering traffic.

    That is down to investment in redundancy. They took the gamble and lost - twice in two days!

    Even if they were to compensate customers, it would probably cost less than having the redundant bandwidth so their FD and shareholders will still be happy!

    1. Commswonk Silver badge

      Re: It's about money!

      That is down to investment in redundancy. They took the gamble and lost - twice in two days!

      Quite so. However the alternative is another gamble; spend more on additional infrastructure and increase the prices the users have to pay so that there is actually a ROI rather than a bit of a black hole in the accounts. And then the customers would be griping about higher charges and perhaps looking elsewhere for their service.

    2. Pascal Monett Silver badge

      Nope, it's about electricity here

      Specifically, the lack of it. Bandwidth is not the solution here, BT is dark in that region because the lights are out as well.

    3. Peter 26

      Re: It's about money!

      Yes this is the truth, why doesn't it just route round the issue but take a bit longer? Certain sites just don't work at all.

  7. Gert Leboski
    Trollface

    GCHQ?

    Hang on, let's simulate a power outage so we can slip these devices into the route.....

    Tin foil is my friend.

    1. Dan 55 Silver badge
      Black Helicopters

      Re: GCHQ?

      One more time and people are going to start to wonder why UPSs everywhere have suddenly gone on strike.

      But two power outages in two successive days in different providers is nothing to bother about. Perfectly normal, that.

      1. Version 1.0 Silver badge

        UPS failure

        You think that they bought all the UPSs at the same time and know the batteries are all reaching replacement time?

        1. Dan 55 Silver badge
          Black Helicopters

          Re: UPS failure

          Listen, don't mention the UPSs. I mentioned it once, but I think I got away with it.

      2. probonic

        Re: GCHQ?

        The UPS's and generators wouldn't have helped here. There wasn't a power outage, a breaker tripped. Breakers are designed to discontinue power to prevent damage if something *inside* the facility caused an issue to cause the breaker to trip.

  8. Dan 55 Silver badge

    "Major Service Interruption"

    Ah, that will be something which inconveniences a small number of customers then.

    1. Roland6 Silver badge

      Re: "Major Service Interruption"

      Well it's all relative. Given how many internet service customers BT has, a few million is "a small number", in fact in Internet terms, the loss of service by 60m users in one country would also be an outage that inconvenienced "a small number of customers"...

  9. GloomyTrousers
    FAIL

    Not just BT/Plusnet

    Apparently this is the root cause of an outage for some Zen customers too. Tech support say they think this is affecting other ISPs as well.

    1. Anonymous Coward
      Anonymous Coward

      Re: Not just BT/Plusnet

      BT's statement said it was affecting other ISP's.

  10. zaax

    So what happends to the £18 standing charge I pay BT / plusnet?

    1. wolfetone Silver badge

      You keep paying it.

  11. Alien8n Silver badge

    My fault apparently

    Had issues all morning with our IP Phones, so inevitably it's my fault that they don't work. As soon as I mention what's happened it's "call BT now and find out when it'll be fixed, we can't work without phones". Except it's not actually a BT issue and we aren't a BT customer? Now if only I could have changed jobs 2 weeks earlier... (still here for another 2 weeks)

    1. Anonymous Coward
      Anonymous Coward

      Re: My fault apparently

      Yes, another morning of a dozen service managers asking inane questions

      have you checked the firewalls?

      what applications are affected?

      have you reported it to BT?

      is this why Henry's PC isn't working?

      why does the Internet affect VPN users?

      when will it be fixed?

      are we there yet?

      Two bloody mornings in a row!

      1. Alien8n Silver badge

        Re: My fault apparently

        Are you my replacement?

    2. My Alter Ego

      Re: My fault apparently

      You wouldn't believe* how often I have to explain to people in my office the HSBC, Lloyds, etc don't like giving me access to fix their online banking.

      * Actually, I imagine you can...

  12. David Austin

    186K

    Anyone else with 186K?

    All our customers using 186K went offline this morning, as did their website (www.186k.co.uk), and partner portal (http://www.dolphinmp.co.uk) and their partner support number (08701 222186), main number (08701 222 186), and support number (0872 232 1999) are all off.

    Looks like they’re knocked 100% offline: At this point, don’t know it’s ISP Issues, or if they’ve folded

    1. David Austin

      Re: 186K

      Not folded: They've just got onto twitter for the first time since 2010 to update us: Looks like the issue at Telehouse North will take "Several Hours" to resolve.

      https://twitter.com/186k

    2. NotMyServersNotMySegfault

      Re: 186K

      It's the Telehouse issue. Complete power outage in several suites is my understanding of it. I have several servers hosted with 186k, all currently off-line, it's also taken down their VoIP phone system. Their own infrastructure is resilient (two entirely independent routes out with failover) but core network is dependent on Telehouse, which of course is supposed to be resilient itself. When Telehouse get the power back on, everything should come back. Problem must be major for it to be taking this long to fix, evidently more than just some tripped breakers.

  13. carrynot

    Whatever happened to the mantra, "NO Single Point Of Failure".

    1. tirk
      Coat

      @carrynot

      But with the cloud, it's a highly distributed... single point of failure.

    2. Infernoz Bronze badge
      Facepalm

      Indeed, isn't critical infrastructure supposed to have fail-over to another site so that one site failing is annoying, maybe with some speed reduction, rather than stopping service completely...

      1. Roland6 Silver badge

        " isn't critical infrastructure supposed to have fail-over to another site so that one site failing is annoying, maybe with some speed reduction, rather than stopping service completely..."

        Depends on the definition of 'critical' being used; which is different from industry to industry and application to application. Remember 5 9's doesn't mean the infrastructure will never fail, just that it is not expected fail very often. That is why when designing really critical infrastructure i.e people could die if it fails, you actually have to design in failure actions, so that it can "fail safe".

    3. Archtech Silver badge

      No single point of failure

      This is one of the most exquisitely painful, sensitive or (if you incline to BOFH humour) uproariously funny things in IT. Human beings think they can do these things, but they can't.

  14. Anonymous Coward
    Anonymous Coward

    "DNS Issues"

    Anyone else get sick of hearing people with half a tech brain claiming these things are DNS issues every time? It's often the same bunch who'll say a defrag of the hard drive will resolve an Outlook-not- loading issue. Unfortunately as DNS is usually the first link in the chain to getting out anywhere, these people always think it's DNS causing the problem.

    Call from a customer this morning who was suffering with this tells me it was definitely DNS, as he was trying to ping 8.8.8.8 and it didn't work. Fell on deaf ears that DNS won't work if there's some higher connectivity issue, groan :-(

    1. David Austin

      Re: "DNS Issues"

      I Dunnow man - I normally find "If In Doubt, blame DNS" A pretty good first troubleshooting step in Internet/Active Directory users.

    2. Doctor_Wibble
      Headmaster

      Re: "DNS Issues"

      The defrag is to mitigate one particular problem, which is to keep the user occupied while we figure out WTF is actually wrong.

      It's not always defrag, depending on the user (or official answers card) but there's any number of diversions available that have at least some marginal benefit so if anyone asks we can talk about it being a sensible precautionary thing. And for the most part the relatively few people who do spot it for what it is will generally understand and even appreciate the semi-conspiratorial admission that they are right. Occasionally works as good PR but depends on how well you handle deviations from the official flow chart, if you are unlucky enough to have one.

      Teech cos I is lecturin a bit, sorry...

    3. Dwarf Silver badge

      Re: "DNS Issues"

      Can't you refer them back to Murphy's second law of electronics ?

      - It works better when its turned on.

      The first law is that it works better when its plugged in

    4. Chris King Silver badge

      Re: "DNS Issues"

      You're preaching to the choir here, AC...

      Router down - "DNS is broken"

      Remote site is down - "DNS is broken"

      FOTT (Firewall On Too Tight) - "DNS is broken"

      Chain a bunch of CNAMEs together and remove one in the middle - "DNS is broken"

      Mail server is blacklisted - "DNS is broken"

      Users' favourite cat picture site breaks their database - "DNS is broken"

      ...and so on. Sadly, routinely arming DNS engineers is not an option. It's not so much a health and safety issue, the cost of ammunition would cripple the business.

  15. IanCa

    something doesn't make sense

    power to kit in a DC should be dual fed, backup generators etc, so every power outage in such a DC should not cause any disruption. that said it clearly does, because it seems that still, a DC operator saying they have full power resilience does not t in fact mean what says on the tin. which ten leads to the next oddity....

    complete loss of power to a major core node in a tier1 network shouldn't cause any disruption to anyone out on the edges. traffic should simply reroute around it to alternative nodes. pretty much the definition of a tier1.

    if they've lost power to devices at the very edge then those are not necessarily backed up by alternates (further out from the core you go, the harder resilience gets ) but the flip side is the further towards the edge the device, the less people will go through any one device so the less people are affecedt. If the losses are to somewhere between outer edge and core, one of various layers of aggregation, then again, there will be resilence in the design.

    BT Adastral@martlesham has some very smart people who have written a lot of the books on carrier grade network design, so I fully expect them to have done their resilience design properly... which is why something doesn't add up...

    1. Anonymous Coward
      Anonymous Coward

      Re: something doesn't make sense

      > ... a DC operator saying they have full power resilience does not t in fact mean what says on the tin

      I can imagine they've fell into a common trap - but this is pure speculation.

      Say you have two independent supplies, UPSs, gennys, etc. And all your customers take two supplies, and the load is nicely balanced between them, and you've got "dual resilient" supplies.

      Great.

      Time goes on, and loads keep going up. In particular, manglement doesn't see any issue when loads reach 50% of capacity. Then there's a problem - one supply trips, all the loads dumps itself onto the other, and what was running at (say) 60% is now trying to run at 120%. Ah, so not quite the fault tolerance they thought they had.

      It's much the same with data links. Resilience, redundant paths, automatic re-routing, yada yada yada. Then said power problem takes out a major node, traffic gets re-routed, and links that were running at (say) 75% are now running at ... well I guess well over 100% seeing as I could see 80+% packet loss at one point past a specific node in BT's network en-route to one of their (and our) customers.

      It wasn't long before I saw a change in route, and that particular customer was back online. But we've others with 186k and they are only slowly crawling back onto the net. One "got a connection" but it was one of the 172.something addresses BT give out when an ADSL login fails. Forced the link down and it kept on trying until about 1/2 hour ago when it got connected. Actual connectivity over it is still "intermittent" though.

      SO that's a second day wasted on dealing with the fallout from this sort of crap. Of course, our customers assume that it's a problem with our services and I have to keep explaining that we are fine - it's just like having an accident close a busy motorway, even those not using that motorway will be affected as all the other roads clog up with traffic.

    2. Alex Brett

      Re: something doesn't make sense

      There are two issues here - firstly there are very few facilities in the Docklands kitted out to a 2n (i.e. having two sets of everything) spec, most are just n+1 (so e.g. if you need 2 UPS units to cover the load, you'll have 3 so can handle one failing). Now n+1 is fine, until a problem either downstream of your redundancy (e.g. a circuit breaker) fails, or something fails in a way your redundancy doesn't expect (e.g. your failed UPS shorting the common bus). With 2n you are in general able to avoid this, as each rack has two supplies fed independently from the grid onwards (the really good ones even have separate substations), but it costs more, and most of the older facilities where the majority of carriers you want to connect to are present in don't have the space etc to actually become 2n.

      The second issue is that all the redundancy in the world doesn't help in some situations - e.g. if you have a fire that somehow your extinguishing system can't manage to deal with, the first thing the fire brigade are going to say when they turn up on site is "OK, turn the power off". To a lesser extent you've also got the issue that a faulty bit of kit could trip both supplies, though good design of the breakers and distribution should be able to limit that e.g. to a single rack being affected.

      1. Blackfriar

        Re: something doesn't make sense

        2N is not two power supplies. All tier 3 data centres have dual power supplies, each one is usually N+1.

        2N is when you have 2 of everything on EACH power supply - twice the UPS, twice the number of generators. What I don't understand about this BT outage is if both their power feeds to the rack failed, why didn't another node pick up the load in a standard failover mode.

    3. Anonymous Coward
      Anonymous Coward

      Re: something doesn't make sense

      >BT Adastral@martlesham has some very smart people who have written a lot of the books on carrier >grade network design, so I fully expect them to have done their resilience design properly... which is why >something doesn't add up...

      What doesn't "add up" is your understanding of BT.

      Adastral is a retirement home for smart people. Smart people who are two smart to resign themselves to being a full-time academic, and not cool enough for the likes of Facebook or Google.

      As BT have proved to the outside world, time after time, BT is a highly commercial organisation, every single one of its decisions has a strong commercial undertone.

      "strong commercial undertone" being a polite way of saying how can we implement this most "cost-effectively" in order to achieve maximum value for our shareholders.

      Just look at the way they've implemented 21CN, just look at the recent Parliament report, the writing's on the wall, you just need to take off your rose-tinted spectacles.

    4. Archtech Silver badge

      Re: something doesn't make sense

      And when you carefully specify (and pay through the nose for) redundant power or network cables, you have to stand over the contractors and make sure they don't put both the "redundant" cables into the same conduit to save time and money.

      Just one of the hundreds of nasty little things that can break redundant systems.

  16. This post has been deleted by its author

    1. Commswonk Silver badge

      Re: Non specific point

      I'm going back to TalkTalk when my year is up.

      Nurse... nurse... this man needs help.

      I'm sorry sir there's nothing we can do; he's too far gone. Try exorcism.

    2. Steve Davies 3 Silver badge

      Re: Non specific point

      The words

      Frying Pan

      and

      Fire

      come to mind.

      If you do go back, just don't come back here and complain when it goes badly.

  17. Jmears

    Zen down as well

    I can confirm that Zen internet are affected by this. My router connects and authenticates but seems to get the IP address of a BT Openreach test network, every URL I open shows the same 'Diagnostic network, page. Trying to work via 1 bar of 4G signal is not much fun (I work from home in a rural village), I think I will write today off.

    1. orangehand

      Re: Zen down as well

      My Zen-in-the-middle-of-nowhere is UP.....

      1. Jmears

        Re: Zen down as well

        Be glad that you are OK, I am on their Unlimited Fibre service with static IP aadress, so I guess that connects via Openreach through London somewhere.

  18. ThorWarhammer

    It's just like Bungie's Destiny servers being down when you're playing a serious session.

    Very slow connection today

    I'm more pissed off with BT's crappy Spam filters that are not worth SH1T

  19. Anonymous Coward
    Anonymous Coward

    Great Britan not so great anymore when it comes to Internet

    I have just spent 2 weeks in Croatia in a Villa in the hills 15 min's away from shops and we had internet speed of 20mb & 2.8MB!

    At home one mile from exchange we're getting 17 MB & 1.5?

    I know it's easier to put new equipment rather than using the old OpenReach network but they really do need to start looking at the big picture.

    Tony

  20. mickm

    BT joined up systems

    I phoned BT yesterday morning as my phone which has a lot of interference normally had become worse, almost unusable, and I assumed this was causing my broadband problems. After approx. 30mins with the BT call centre from 10:00 to 10:30 ish they agreed to monitor my line. No mention of the actual problem with the service just the usual nonsense master socket check etc.

    Do they engineers inform the call centres etc, apparently not!

    1. Archtech Silver badge

      Re: BT joined up systems

      This is related to Brownridge's Law:

      "The quicker a phone's answered in sales, the slower it's answered in customer services".

      This is by no means accidental. Huge corporations like BT can afford to treat customer service in much the same way that the opposing sides treated defence during the First World War. Multiple lines of trenches, machine guns, barbed wire, massive artillery bombardments, and if felt necessary poison gas. Anything to slow the bastards down!

      The harder it is to get service, and the slower and more unwillingly it is carried out, the less money the corporation has to spend on it year by year. If (and perhaps only if) you have something close to a monopoly, you can make a very great deal of money that way.

  21. Anonymous Coward
    Anonymous Coward

    In Telehouse's Defense

    Telehouse North is getting on a bit and there was talk long ago of gutting it at 25 years (2015).

    BT is a major stakeholder and a major offender for power overuse. Generally clients routinely flaut the rules on consumption, filling racks to capacity with zero spacing, overflowing equipment onto empty footprints or connecting power cables to outlets several positions from their own. Rule of thumb is the bigger the client, the worse they behave and the more likely they'll be allowed to continue.

    There are some pretty frightening sights under the floor tiles, you might even find the odd bit of contraband kit quietly running under there.

    Account management generally tut tut tut and do nothing past wagging a finger whilst facilities have to keep the place in service. Cooling the building has always been a major challenge because of the above.

    My bet would be that recent high temperatures have caused a major piece of kit to fail and it's taken out other areas nearby.

    1. Drewc (Written by Reg staff) Gold badge

      Re: In Telehouse's Defense

      FYI

  22. a_mu

    Telephone exchanges

    Question.

    Once apon a time ago, we all used land line phones,

    No I'm not thinking of going back to them,

    but did the phone exchanges of old suffer these sorts of power outages ?

    I could have rose tinted glasses , but I can' t remember not having

    a ringing tone when I needed it.

    during a power cut, flood or what ever.

    Whats changed to our ability to do redundancy or is it my memory ?

    1. Steve Davies 3 Silver badge

      Re: Telephone exchanges

      Exchanges didn't suffer power outages for the following reasons

      1) Everything was DC powered

      2) There were sodding great Batteries in the Exchange that kept everything working when the power went down.

      Back in the 1990's we supplied lots of kit to go in exchanges. It was all 48V DC powered.

      Then apparently OFCOM ruled that this wasn't needed any longer. Probably some arm twisting by the BT Bean Counters.

      Datacentre rues then applied.

      Batteries were thrown away and the rest is history.

  23. J.G.Harston Silver badge

    Google wouldn't let me log into Google Groups today as it thought somebody had stolen my details and was trying to log on from Brum while I was still logged in via somewhere near Scarborough.

    1. Steve Davies 3 Silver badge

      Google and Point of presence

      is a right royal pain in the but.

      Haven't the wizzards at the chocolate factory heard on Mobile Data? You know, you move around and use the services people like Google supply.

      One day I could be in London. later that same day I could be in Los Angeles or in Kirkwall.

      Any service that enforces this sort of geo-locking is really crap (just my humble opinion and not worth the effort it has taken to write this post). Their systems should be designed to cater for this.

  24. Cynic_999 Silver badge

    What's the betting the circuit has been overloaded by the increased consumption of the air conditioning systems (office & rack)?

    1. patrickstar

      The ACs are not powered by the UPS in a datacenter, only by the generator.

      DC UPSes are meant for computer/network gear only and will become very unhappy if you try to power ACs from them - chances are the ACs would get pissed off as well. No need to try in the first place since you can do perfectly fine without cooling during the few minutes it takes for the generators to start.

  25. The Vociferous Time Waster
    Megaphone

    Your MP is an idiot...

    Your MP is an idiot so won't be able to understand the nuances of this double failure and how it is significant in terms of national infrastructure (and national security) and economic prosperity. The only way he will care enough to ask questions in the house is if you use your considerable specialist knowledge to write to him or her and explain how this sort of thing really shouldn't be able to happen. If two key sites can go offline so easily imagine if sites were taken offline as part of a coordinated attack to coincide with some activity that caused major economic harm to the UK - it would make Brexit look like child's play.

    The only way to get your representatives in Government to investigate this is to show them you care and why they should care too.

  26. Anonymous Coward
    Anonymous Coward

    Your MP is an idiot...

    Your MP is an idiot so won't be able to understand the nuances of this double failure and how it is significant in terms of national infrastructure (and national security) and economic prosperity. The only way he will care enough to ask questions in the house is if you use your considerable specialist knowledge to write to him or her and explain how this sort of thing really shouldn't be able to happen. If two key sites can go offline so easily imagine if sites were taken offline as part of a coordinated attack to coincide with some activity that caused major economic harm to the UK - it would make Brexit look like child's play.

    The only way to get your representatives in Government to investigate this is to show them you care and why they should care too.

    1. ricardian

      Re: Your MP is an idiot...

      Last night the announcement on BT phone line 0800 169 0199 was "we are improving the broadband network". What on earth does it mean?

    2. Anonymous Coward
      Anonymous Coward

      Re: Your MP is an idiot...

      "if you use your considerable specialist knowledge to write to him or her and explain how this sort of thing really shouldn't be able to happen."

      I wrote to my MP using my "considerable specialist knowledge" to express concerns in relation to a technology subject. He didn't even deign me with a reply, of any description, not even boilerplate acknowledgement signed pp by an assistant ... nothing !

      MPs are a useless waste of space, particular the career-politicians who have absolutely no clue what the real-world looks like as they've only ever known Westminster.

  27. Alien8n Silver badge

    Virgin Media

    Is the current extended downtime for VM at all related? It's been up and down for a few days but since BT went TITSUP it's just been down.

    1. illiad

      Re: Virgin Media

      well if you tell me your area, I could check... VM is still very good in TW postcode.. :)

  28. beecee

    not excusable

    Thought I'd see what the comments were re last week's 'outages'. We are a one woman and her husband Ed. SME. We experienced 5 hours off line on Tuesday from 1800ish whilst Dell was attempting to discover the fault on an XPS. When I contacted PlusNet was informed that OpenReach were conducting 'planned maintenance’. Yeah and I'm the Queen of Sheba there was nothing up on their site that I could find. Don't remember ISDN in the 90s like this, suppose that's progress e.g. demand etc. Interested to know if countries like South Korea experience these issues, expect it wouldn’t be tolerated. What/whose fault it was it’s just not excusable in 2016!

  29. beecee

    this was the notice

    Fault troubleshooter maintenance - 20th July 17:30 - 21st July 10:00

    This was the notice, faults troubleshooter err faults well yes, not related, who knows, maybe somebody dropped a spanner, just maybe.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019