back to article BT customers hit by broadband outage ... again

BT customers in the UK are once again banging their heads against their keyboards this morning: a power outage has thrown them offline for the second day running. Today the issue is a power outage at Telehouse North in London. An email message from BT Wholesale, with the subject line 'Major Service Interruption' – seen by The …

Page:

  1. David Knapman

    "It is ideal for telecom providers, financial trading exchanges, media and gaming companies that require speed, reliability and reach."

    Hmm. Note how they're careful not to include "power" in that list.

    1. lglethal Silver badge
      Joke

      I guess they can take reliability of their list now as well....

  2. Anonymous Coward
    Anonymous Coward

    Hello from Sweden

    Only I'm not actually in Sweden, I'm having to use a VPN connection because BT's acronym today is Buggered Telecom.

    1. Ragarath

      Re: Hello from Sweden

      So BT's network got you to a VPN via Sweden. Seems like it's working for you ;)

      1. Anonymous Coward
        Anonymous Coward

        Re: Hello from Sweden

        I wouldn't say completely working, I couldn't connect to the server in France.

  3. Pen-y-gors Silver badge

    Some sympathy -but not a lot

    Having recently had a 5-day outage at the bods who do some of my hosting (how often do both drives in a RAID setup fail at the same time? And with a dodgy backup?) and spent time placating customers for cock-up by third party, I feel a teensy-weensy bit of sympathy for BT when they are affected by a supplier's kit going titsup. (We're currently migrating all our customers to someone more reliable anyway - they really should have been more apologetic!)

    BUT - BT is not the same as a small One-man-and-a-cat rural welsh IT company specialising in bi-lingual web development. They're several times bigger, and should surely be able to afford truly redundant kit that just keeps working, although I will allow them a hiccup or two in the event of meteorite strikes or Brexit.

    1. Alister Silver badge

      Re: Some sympathy -but not a lot

      @pen-y-gors

      I think you've got the wrong end of the stick here, It's not BT's own infrastructure in Telehouse North, (despite the confusing name) it's part of the London Internet Exchange, which provides routing and connectivity to all the Tier2 providers, so BT are more in your position at the moment, of sitting waiting for their suppliers to sort their shit out.

      1. TrevorH

        Re: Some sympathy -but not a lot

        I don't believe this problem has anything to do with LINX. THN is a massive building and LINX have space there but the room currently affected by the power problems is not the LINX suite.

        1. Alister Silver badge

          Re: Some sympathy -but not a lot

          I don't believe this problem has anything to do with LINX. THN is a massive building and LINX have space there but the room currently affected by the power problems is not the LINX suite.

          Well we're having big problems with latency and packet loss from a Plus-Net fibre link to some of our servers at Firehosts/Armor, and most of the issues seem to be the Telia nodes in London:

          ldn-b3-link.telia.net

          ldn-bb2-link.telia.net

          ldn-b5-link.telia.net

          By comparison, we aren't seeing any issues with these hosts on the same routing:

          peer2-et-9-1-0.redbus.ukcore.bt.net

          peer5-te0-0-0-31.telehouse.ukcore.bt.net

          So maybe this is a routing issue from BT onwards?

      2. Anonymous Coward
        Anonymous Coward

        Re: Some sympathy -but not a lot

        I'm sorry, you're incorrect. Today's outage directly affected BT's suites, so their infrastructure lost power. It did not affect the LINX suites which are located elsewhere in the building.

        However they are waiting for their supplier - Telehouse needs to restore power to the affected areas before BT can bring their kit back up.

      3. patrickstar

        Re: Some sympathy -but not a lot

        LINX going down would at most lead to reduced capacity, not an outage (well, not for long, at least).

    2. dajames Silver badge

      Re: Some sympathy -but not a lot

      (how often do both drives in a RAID setup fail at the same time? And with a dodgy backup?)

      If two drives are from the same batch, and have spent their entire lives in the same RAID setup where they have experienced very similar duty cycles, thermal conditions, vibration, etc., then it's not entirely unlikely that they will have about the same lifetime.

      If one drive in a RAID array fails, the other(s) will experience atypically high use while the array is rebuilt, and the chances of a second drive failing during the rebuild are not insignificant. It does happen.

      The dodgy backup is another matter, though. There should always be more than one backup.

      1. Anonymous Coward
        Anonymous Coward

        Re: Some sympathy -but not a lot

        I've always wondered why there isn't business practice to replace half the drives in RAID 10 and move them over to a fresh server with more brand new drives. Then you get half your drives (make sure each is one half of a mirrored pair) with a remaining MTBF to the others and therefore reduce the chances of dual failure.

        Doing this could allow you to use cheap low endurance SSDs rather than high end drives (HP Drives High Endurance 800GB SSD = £2700 ex VAT each!) with less overall risk even if it does require more swapping over the years. If you have spare room in the chassis to keep hot spare then so much the better.

        1. Captain Scarlet Silver badge

          Re: Some sympathy -but not a lot

          @AC Replacing drives can have some effect on the systems running on them and managers hate people whinging the systems always slow.

          1. Anonymous Coward
            Anonymous Coward

            Re: Some sympathy -but not a lot

            Replacing a mirrored SSD doesn't affect performance very much in testing. Losing a RAID system and having to restore from backup can annoy users a lot more. If you have a redundant system then you can switch your VMs over to your other systems while you run the disk replacement.

            In any case you are only looking to switch the drives out every 18~30 months (depending on MTBF/2). If it allowed you to buy SSDs whereas previously you would have had Hard Drives instead then the speed increase will be major anyway and the volume during a RAID Mirror rebuild would be faster than the normal operation of non-SSDs in a VM environment anyway.

      2. AndrueC Silver badge
        Facepalm

        Re: Some sympathy -but not a lot

        Back when I was a data recovery engineer we saw drives from the same batch failing quite a few times. We also saw failed controllers since most RAID boxes only have one controller. But the most common problem was user error due to one of the following:

        * Ignoring first drive failure and continuing to run the system until another went. A RAID is not a form of backup. It's a 'get me home' solution.

        * Poorly written software that didn't make it clear how to rebuild an array and didn't have adequate safeguards to protect the array integrity when adding a new drive.

        * Incompetence.

        1. Anonymous Coward
          Anonymous Coward

          Re: Some sympathy -but not a lot

          Upvote, and allow me to repeat that Very Important Statement once more because it cannot be said often enough:

          A RAID is not a form of backup. It's a 'get me home' solution

          Thank you.

        2. PNGuinn
          Mushroom

          "A RAID is not a form of backup. It's a 'get me home' solution."

          And how many LARTs do we need to get that simple fact through to pointy headed management and even pointer headed bean counter??

          GRRRR ...

    3. Anonymous Coward
      Anonymous Coward

      Re: Some sympathy -but not a lot

      Unfortunately the majority are only happy if they're paying a fiver a month or something silly for Internet.. You'll hear them at the water cooler saying how they only pay peanuts, and if one of the big providers don't do it cheap enough they'll move elsewhere.

      Unfortunately, all these fivers a month don't add up to a resilient system. It's not enough for every link in the chain to be as reliable as it could be - yes the companies involved make some profit, and yes, probably a lot of it - but that's what businesses exist for.

      They can't make profit AND provide 100% resilience on a shoe string. Something somewhere will suffer. So next time you hear someone boasting they pay a fiver for Internet/calls/Sky, and the next day whinging it was off for a few hours, remind them on that fibre across oceans and satellites in space doesn't come cheap. It's surprising it works at all.

      1. Anonymous Coward
        Anonymous Coward

        Re: Some sympathy -but not a lot

        "Unfortunately, all these fivers a month don't add up to a resilient system. It's not enough for every link in the chain to be as reliable as it could be..."

        And yet we pay £30,000 per year to BT for a single** BTNet line and are having the same issues. So how much should we be paying to ensure that we have some resilience in the network?

        **Before you comment, there is no option for second line without £100,000+ in groundworks which would still have also been affected by this problem.

        1. Anonymous Coward
          Anonymous Coward

          Re: Some sympathy -but not a lot

          Ahh, because you're paying £30k, you expect that'll cover resilience everywhere. Nope - you're just subsidising everyone paying their fivers a month.

          Unfortunately even your £30k/y doesn't buy resilience in a national network - it would barely cover the wage of one "engineer" who climbs poles all day! It wouldn't come close to running you your own separate little bit of pipe all the way back to a data centre and beyond, so you have to partially share infrastructure with the masses.

          Like it or not - the market has spoken - and it decided it wanted cheap internet that sometimes goes off now and again. And if TalkTalk don't supply it, they'll go to Sky. And if they put their prices up 50p, they'll move on again. Businesses having no option on what they'll pay have to delve deep in their pockets to keep the whole affair ticking over.

          You might not like that, but that's the way it is - the proof in the pudding is your Internet being off two days on the trot. If you don't like that for the money you're paying, don't pay it.. and encourage all the cheapskates who're the root cause of the problem to pay a bit more for their Internet!

          1. Archtech Silver badge

            Re: Some sympathy -but not a lot

            "Like it or not - the market has spoken - and it decided it wanted cheap internet that sometimes goes off now and again".

            In very much the same way as, 30 years ago, the market spoke and decided that it wanted cheap software that sometimes has to be rebooted and that has little integrity and no security. That's what made Microsoft and Bill Gates the world leaders they are now.

        2. Blackfriar

          Re: Some sympathy -but not a lot

          Sounds like the person paying £30k/annum has a big internet pipe or is based somewhere remote and expensive. Do you host servers at the site or just need it for internet and possibly voice comms?

          Have you considered a mobile (4G if available) or bonded multiple DSL service as a backup? These don't have to go via BT's core even if delivered locally by BT copper lines for the DSL.

      2. Anonymous Coward
        Anonymous Coward

        Re: Some sympathy -but not a lot

        Where can I get this broadband at £5 a month?

      3. Soundman

        Re: Some sympathy -but not a lot

        Telehouse is expensive - plus it has the best infastructure of any. I worked there for three years +

        I assume a single trip means just one of their UPS power cabinets dropped off. We can only guess the cause at this time. A spot of load redistribution and a reset is a very quick fix. The knock on of restarting an entire network is rather more lengthy.

        1. Anonymous Coward
          Anonymous Coward

          Re: Some sympathy -but not a lot

          You can't have looked hard - One of the suppliers was mentioned in the post - TalkTalk.

          Their website currently advertises FREE broadband for 18 months (after 17.70 line rental). Pretty sure my line rental with KCOM is about the £14 mark, so even if TT's line rental is slightly above average, it's still about a fiver for the broadband.

          Their retention offers are better again - my grandfather pays peanuts for all his TalkTalk goodies.

          The FREE broadband is interesting though, as to my knowledge BT/Telehouse etc are unlikely to offer all they do for free, so something somewhere is being skimped on. (It's kind of how eBay encourage you to post your items for free, but I've not yet found a Post Office that'll do that for me). There are these offers all over the place, then we're sat here asking where the redundancy is, LOL!

    4. SImon Hobson Silver badge

      Re: Some sympathy -but not a lot

      > BT is not the same as a small One-man-and-a-cat rural welsh IT company

      Indeed.

      Us small fish just have to accept what conditions are on offer. BT are big enough that they should be able tocan dictate to suppliers how things are done. It may be a supplier's problem, but BT can't hide behind that because either they've audited the setup and were happy with it (oops), or they didn't audit it in which case they can't be said to have done due diligence (oops).

      Either way, from a PR PoV it's BT's name in the headlines.

    5. John Sanders
      Windows

      Re: Some sympathy -but not a lot

      I would not go with a very large company unless they have something I strictly require.

      There are plenty of competent small-medium sized ISPs in the UK.

      Just my 2 pence.

      1. Archtech Silver badge

        Re: Some sympathy -but not a lot

        Which is why I went with fast.co.uk (Dark Group) back in 2007. I have found them eminently satisfactory ever since. But I was among those without any Internet connection this morning, because unfortunately fast.co.uk were dependent on the kit up in Telehouse North. Even their phones weren't working!

    6. Vince

      @Pen-y-gors Re: Some sympathy -but not a lot

      "Having recently had a 5-day outage at the bods who do some of my hosting (how often do both drives in a RAID setup fail at the same time?"

      When they're from the same batch, same make, same model, same usage pattern (RAID-1 would suggest exactly that by design)...

      ...quite often actually. One of the (many) reasons we mix drive models/batches/makes in RAID arrays

  4. GreggS

    Nothing to do

    with the recent Government report telling BT to get their house in order???

    1. Doctor_Wibble
      Black Helicopters

      Re: Nothing to do

      > ...the recent Government report ...

      No, it's because the Snoopers Charter is back in the works and the black boxes are being installed. Every major outage happens around the same time as the charter reappears, and always in a different place because obviously you don't install the things in the same place twice.

      Check the records, you'll see I'm right!

      Black helicopter as there's no point being AC any more because They know, They always know, as though They were right in here with me in my secret location under the stairs.

      Oh no, They are back! How do They keep finding me?!?!?

      1. David Gosnell

        Re: Nothing to do

        Curiously hot on the heels of a recent mysteriously kept-under-wraps issue at Plusnet, where despite their insistence they don't intercept or anything of the ilk, connections to third-party SMTP servers were timing out – but only when messages had an attachment, no matter how tiny. Very, very odd, indeed pretty much inexplicable without foul play involved, especially with supposedly encrypted connections. They tried to blame it on some other ongoing DNS issues, but DNS doesn't care whether email has attachments or not...

        1. Alister Silver badge

          Re: Nothing to do

          @ David Gosnell

          I've come across a similar SMTP issue where the MTU is set to Max (1500) and the do-not-fragment bit is set on SMTP packets, it causes multiple retries and timeouts.

          1. David Gosnell

            Re: Nothing to do

            Hmm, certainly possible. Although the issue occurred with even trivially small attachments, I'm not sure I tested it with anything so small as to be under the MTU.

      2. Anonymous Coward
        Anonymous Coward

        Re: Nothing to do

        There have been 'anonymous' boxes at Telehouse for at least 10 years. Why would installing them affect the power?

  5. chris 17 Bronze badge

    @ Pen-y-gors

    "how often do both drives in a RAID setup fail at the same time? And with a dodgy backup?"

    if its that important then you need to pay more for a more resilient solution, perhaps active / standby cluster across geographically separated data centres provided by different vendors on different ISP's and supplied by different power stations. what's the cost to you of the outage vs the cost of a resilient solution?

    regarding BT, yes this latest outage should not have the impact on its network that it has. Services should have instantly rerouted with little to no impact on end users.

  6. Ellis Birt 1

    It's about money!

    Both 'outages' have been the loss of part of their diverse network infrastructure.

    They have highlighted that while their network is diverse, there is not enough bandwidth when one major location goes down to handle all their peering traffic.

    That is down to investment in redundancy. They took the gamble and lost - twice in two days!

    Even if they were to compensate customers, it would probably cost less than having the redundant bandwidth so their FD and shareholders will still be happy!

    1. Commswonk Silver badge

      Re: It's about money!

      That is down to investment in redundancy. They took the gamble and lost - twice in two days!

      Quite so. However the alternative is another gamble; spend more on additional infrastructure and increase the prices the users have to pay so that there is actually a ROI rather than a bit of a black hole in the accounts. And then the customers would be griping about higher charges and perhaps looking elsewhere for their service.

    2. Pascal Monett Silver badge

      Nope, it's about electricity here

      Specifically, the lack of it. Bandwidth is not the solution here, BT is dark in that region because the lights are out as well.

    3. Peter 26

      Re: It's about money!

      Yes this is the truth, why doesn't it just route round the issue but take a bit longer? Certain sites just don't work at all.

  7. Gert Leboski
    Trollface

    GCHQ?

    Hang on, let's simulate a power outage so we can slip these devices into the route.....

    Tin foil is my friend.

    1. Dan 55 Silver badge
      Black Helicopters

      Re: GCHQ?

      One more time and people are going to start to wonder why UPSs everywhere have suddenly gone on strike.

      But two power outages in two successive days in different providers is nothing to bother about. Perfectly normal, that.

      1. Version 1.0 Silver badge

        UPS failure

        You think that they bought all the UPSs at the same time and know the batteries are all reaching replacement time?

        1. Dan 55 Silver badge
          Black Helicopters

          Re: UPS failure

          Listen, don't mention the UPSs. I mentioned it once, but I think I got away with it.

      2. probonic

        Re: GCHQ?

        The UPS's and generators wouldn't have helped here. There wasn't a power outage, a breaker tripped. Breakers are designed to discontinue power to prevent damage if something *inside* the facility caused an issue to cause the breaker to trip.

  8. Dan 55 Silver badge

    "Major Service Interruption"

    Ah, that will be something which inconveniences a small number of customers then.

    1. Roland6 Silver badge

      Re: "Major Service Interruption"

      Well it's all relative. Given how many internet service customers BT has, a few million is "a small number", in fact in Internet terms, the loss of service by 60m users in one country would also be an outage that inconvenienced "a small number of customers"...

  9. GloomyTrousers
    FAIL

    Not just BT/Plusnet

    Apparently this is the root cause of an outage for some Zen customers too. Tech support say they think this is affecting other ISPs as well.

    1. Anonymous Coward
      Anonymous Coward

      Re: Not just BT/Plusnet

      BT's statement said it was affecting other ISP's.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019