back to article BT internet outage was our fault, says Equinix

Telecity data center owner Equinix has 'fessed up to a "brief outage" that knocked 10 per cent of BT internet subscribers offline in the UK as well as a number of other providers on the morning of 20 July. A spokesman from the group, which slurped up Telecity for £2.3bn in 2015, confirmed that the outage occurred at Equinix's …

  1. Alan Edwards

    That explains it

    I was down to the backup, backup connection for a while - PlusNet dead, the 3G card in the laptop had no signal, so I was on the MiFi router.

    Anyone seeing flaky DNS for a while this morning too? It looked like external DNS servers were being being blocked, I could ping 8.8.8.8 but got no name resolution. I was about to ring them and have a rant and it started working again.

    1. AMBxx Silver badge

      Re: That explains it

      I don't think it was DNS. It seemed to be the route to certain servers wasn't available. My DNS is via OpenDNS. that wasn't accessible at all, so I had to change to Google's DNS. Then DNS worked, but some sites still weren't available.

      It even appeared to be just some protocols - from a remote session, I could connect to webmail, but not receive email.

      1. Steve Evans

        Re: That explains it

        From my experience this morning, things that worked, worked. Things that failed, failed. Constantly.

        So it certainly appeared to be restricted to certain routes... Google DNS, google itself, G+ and Hangouts were all working perfectly.

        Facebook partially worked, it obviously has some bit squirrelled away on servers which take a different IP route.

        Speedtest wouldn't even appear, and linx.net failed to connect.

        I wonder whatever happened to the original raison d'etre of the intartubes, namely that the message will always get through?

      2. psychonaut

        Re: That explains it

        yeah, it wasnt just dns. i had the same idea to use open dns instead of bt's. things seemed to load quicker, but still couldnt get on to most of the web. also didnt change the fact that we still couldnt ping the hosted mail server. then used 4g through a phone (vodafone) - that worked for about 5 minutes to the email server and then it went off too.

    2. TimR

      Re: That explains it

      Alan - yes, 8.8.8.8 was not very responsive. Swapped over to 8.8.4.4 which helped to a degree

  2. Tom 7 Silver badge

    Just this morning?

    I've been getting a few problems for a day or two that have just this afternoon seemed to clear up.

    1. A Non e-mouse Silver badge
      Joke

      Re: Just this morning?

      Looks like the universal IT cure-all of turning it off and back on again has once again fixed the fault.

  3. A Non e-mouse Silver badge

    A BT spokeswoman said the power issue affected around 10 per cent of internet usage - meaning one-in-ten attempts to connect to the website they want to go to may fail.

    Well that's one way to read it. Another, is to that the 10 percent of Internet sites were unavailable (From BT's network) during the outage.

    1. Tim 11

      I was getting 10% packet loss pinging facebook. not sure how that works or if it was related but that's what I saw

    2. This post has been deleted by its author

  4. Mad Mike

    Brief outage?

    According to the BBC, it lasted from 7:55 to 8:17. That's not a brief outage in my books.

    Given this companies lamentable history in managing to keep power on to their datacentres, you have to question why anyone uses them. The whole point of using multiple, independent (supposed to be, but obviously not) power feeds is to prevent just this sort of thing. That's at least 3 total power failures in different datacentres of theirs and that suggests a far bigger, root cause than just bad luck. It suggests fundamental design flaws and/or failures in operational and maintenance procedures.

    1. AMBxx Silver badge
      WTF?

      Re: Brief outage?

      I was out from 9am until 11.30am. Not brief and not within the times stated.

      1. Lodgie

        Re: Brief outage?

        Wasn't back until gone midday having failed at 9am. Took the backups out too.

    2. Cynical Observer

      Re: Brief outage?

      One of the Bs in BBC could be used for another word - not complementary.

      Things were still amiss as late as 11am - even sites as major a Google.co.uk proving to be hit and miss when trying to hit them.

      What was interesting was that anything already successfully accessed earlier in the morning seemed to be markedly more reliable at loading additional pages e.g. El Reg

      So... did they have a DNS wobble as a result of they outage?

    3. Anonymous Coward
      Anonymous Coward

      Re: Brief outage?

      The Power outage lasted from 07:55 to 08:17.

      The carnage that resulted from it lasted a few hours more.

      1. Mad Mike

        Re: Brief outage?

        Mmmm. I wouldn't call a power outage that lasts for 22 minutes, a brief one.......

    4. DonL

      Re: Brief outage?

      "It suggests fundamental design flaws and/or failures in operational and maintenance procedures."

      Indeed, I am also wondering how that works. In my small datacenter most things have dual power supply and are then connected to 2 different UPS units etc. Everything with only one power supply is connected to and automatic power switch for redundancy. I've seen so many UPS failures that I only want to rely on them when there is an actual power failure.

      So I wonder how one UPS failure can cause an outage unless there is an obvious design flaw.

      1. Richard Wharram

        Re: Brief outage?

        Agreed. Doesn't make sense. Also the Telecity data centres I've visited looked very well designed and I can't imagine how one UPS failure would affect them at all. Indeed turning off half the UPS is a standard and regular DR test.

        1. Anonymous Coward
          Anonymous Coward

          Re: Brief outage?

          They were well designed originally, but then Topsy took over. In the parts only the enginners see, they are a dogs breakfast...

    5. Mike Tubby
      FAIL

      Re: Brief outage?

      My company builds mission critical systems for the emergency services ... we use dual, but independent, UPS(es) powered from different phases arriving on different power connections from the RCSs, dual but independent switches with dual power supplies (fed from separate UPSes), servers with dual power supplies, network connections with teaming/bonding, etc. and we can keep systems up for hundreds or thousands of days ... what the heck is a single UPS doing bringing down a big chunk of t'internet?

      By now you would have thought that t'internet was considered part of Blighty's Critical National Infrastructure (CNI) and treated to the appropriate levels of 'protection'...

      While many DSL users were off altogether our Bleedin' Terrible (BTnet) circuit appeared to route flap back and forth between Ealing and Ilford at hop #3 with interesting latencies:

      root@gate:~# traceroute -n www.microsoft.com

      traceroute to www.microsoft.com (104.82.195.110), 30 hops max, 60 byte packets

      1 195.171.43.1 0.432 ms 0.471 ms 0.501 ms

      2 62.7.207.104 5.446 ms 6.740 ms 6.737 ms

      3 109.159.248.2 763.008 ms * 762.993 ms

      4 213.121.193.61 9.740 ms 213.121.193.27 12.045 ms 213.121.193.45 10.009 ms

      5 62.6.201.169 9.398 ms 9.828 ms 62.6.201.167 9.457 ms

      6 195.99.126.19 10.034 ms 10.183 ms 13.343 ms

      7 104.82.195.110 9.075 ms 9.732 ms 9.310 ms

      The BTnet service page showed a red alarm for 13% packet loss in to Europe and over 20% in to Asia at the same time.

      Come on BT - exactly what backup circuits do you have??? Was that a bit of 100Mbps fibre you were trying to use as a backup for a bundle of 10Gb DWDMs?

      Mike

      1. This post has been deleted by its author

      2. Anonymous Coward
        Anonymous Coward

        @Mike Tubby

        "The BTnet service page showed a red alarm for 13% packet loss in to Europe and over 20% in to Asia at the same time"

        Where is that service page, I can't find it?

  5. boltar Silver badge

    The internet routes around damage!

    Errr, not.

    1. Alister Silver badge

      Re: The internet routes around damage!

      Yes, it does.

      There was damage in London, affecting the UK, so the rest of the Internet carried on without us.

      1. boltar Silver badge

        Re: The internet routes around damage!

        "Yes, it does"

        Not in the sense of it still being available to people if a particular access node/router goes down it doesn't which was the original point of the saying back in the day.

        1. Pascal Monett Silver badge
          Trollface

          Well, yes, in the sense that the rest of the world routed its packets around the dark area;

          As for those in the dark area, they had nothing to route, to the saying stays true.

      2. psychonaut

        Re: The internet routes around damage!

        "Yes, it does.

        There was damage in London, affecting the UK, so the rest of the Internet carried on without us."

        our own little trial internet brexit?

      3. Nonny Mouse

        Re: The internet routes around damage!

        Brexit all over again....,

  6. Anonymous Coward
    Anonymous Coward

    But it is still BT's fault

    isn't it.....

    BT ==== Blame Target

    1. rhydian

      Re: But it is still BT's fault

      If I'm paying BT/Plusnet, then I'll blame them if their service isn't working. How am I to know its the fault of Telecity/Equinix/Teh Lizzerds?

  7. Doctor Syntax Silver badge

    Are they called uninterruptible because you shouldn't interrupt them and they go wrong if you do?

  8. MJI Silver badge

    Out from PC on

    Until around mid day.

    NOT the short period they claim

  9. Version 1.0 Silver badge

    Malgorithms?

    The Sponsored El Reg story with this report is "The Nuts and Bolts of Ransomware in 2016" - I assume that the association with BT is purely coincidence?

  10. DuncanL

    So much for fixed...

    Still knackered here and relying on a VPN to work round it.

    1. DuncanL

      Re: So much for fixed...

      And still semi-broken today - most sites seem to work but the odd one or two still fail (request timed out in tracert)

  11. CanadianMacFan

    Don't know how to configure a Data Centre

    Back when I was a system administrator for a government department looking after the computers for the web sites losing a UPS in the data centre would have been no big deal. Each chassis holding our blade servers held four power supplies. Two power supplies were connected to one power distribution unit (PDU) and the other two power supplies were connected to a different PDU. The PDUs were cabinet models and each PDU was connected to a separate UPS. The whole data centre had a diesel generator for a backup too. So losing a UPS, or a PDU, would have had no impact on my servers since they still would have gotten the electricity through the other path. I also had redundant network and SAN switches. I made sure that every server had at least two instances running so that one could be brought down for service without impacting the users. The only issue that I ever had with those IBM blades was the SCSI hard drives that started to fail after about three years (not bad in a server environment) which were replaced with the SAN. All running Linux of course.

    As an aside we got stuck with a rack of the first generation of the HP blades and they were horrible. They ran so hot special cooling had to be installed in the data centre for them and we kept on have RAM failures. I was thankful that I had as little to do with them as possible.

    1. Ian Emery Silver badge

      Re: Don't know how to configure a Data Centre

      Only one diesel generator?? Not much failure proofing there!!!

      Then there is maintaining the equipment, of which I have an example - although not IT related.

      My local pumping station had THREE diesel back up generators, so you would think in the event of a power outage, the water would keep flowing city-wide; alas, the company involved are notoriously bad at both record keeping and maintenance.

      On the day the power went out, it turned out No1 wouldnt start, No.2 hadnt had its fuel replaced, or been serviced since the mid 70's and No3.....

      Well it seems No3 had last been checked in 1943, all that remained was a large pile of rust.

  12. TWB

    Ah, that was it

    I had just transferred to Plusnet today and thought it was odd that I could not reach some websites.

    1. Ian Emery Silver badge

      Re: Ah, that was it

      No, that's just life as a PN customer.

      (Why on earth would you join PN??!!)

      1. TWB

        Re: Ah, that was it

        'Why on earth would you join PN??!!'

        Because I had been with Talktalk.

  13. Anonymous Coward
    Anonymous Coward

    Still having dns errors lasting minutes and sites inaccessible even now.

  14. Dwarf Silver badge

    Feeling the Heat

    I'm putting my money on the fact that yesterday was blisteringly hot and its well know that major metropolis's are hotter than the coast, so I'd guess that the air conditioning in the Data Centre was working flat out, which in turn pulls a little bit more power as the AC has to be on the UPS, since a computer room with power and no cooling doesn't work for very long at all !.

    So, this comes down to stress testing - we (as in the users of the Internet) stress tested the data centre and something went beyond its limits and decided to fail this morning (Obviously Magic White Smoke was released somewhere)

    What I saw was some banking sites and job boards disappear.

    Even though I'm not a great BT supporter, I think its unfair that TalkTalk are trying to blame them for this - they were just affected as they were in the same bit barn. The Internet continued to work, but some people accessing services hosted there hit some problems. End users never have any resilience, but hosted sites like Nat West and the BBC should be using some form of global load balancing to work around such issues.

    I wonder how a TalkTalk service with its single connection to an upstream hub would work in the same situation. The answer is obviously that it would fail in the exact same manner, its just that the helldesk wouldn't have a clue about the problem and would blame anything except their service.

    1. Doctor Syntax Silver badge

      Re: Feeling the Heat

      "I think its unfair that TalkTalk are trying to blame them for this"

      Issuing a statement blaming someone else needs to be a core competence there.

  15. OliP

    whilst this might well be down to a badly configured data center (highly possible)

    we should also bare in mind the crazy amount of construction going on in the local area.

    the whole situation is a mess however - we get a few events a year that show how fragile it all is, most hidden from public view but when you're monitoring over 100,000 pc's on various networks all over the UK you get a good idea when "one bad router" ends up affecting networks in a (geographic) quarter of the country.

    where i work we saw no outage, only those working from home (on crappy residential broadband) were intermittently unable to join lync meetings. redundancy IS possible, as long as you're willing to pay for it.

    AWS availability zones appear to have kept the site up as well, no dip in traffic that we can see.

  16. MrMur
    Joke

    One of our devs...

    One of our devs said he couldn't get to StackExchange so he was off for a nap.

  17. Anonymous Coward
    Anonymous Coward

    the "U" in UPS

    stands for, what exactly???

    1. Dwarf Silver badge

      Re: the "U" in UPS

      Even during ALL failure scenarios ?

      - Like taking all the components outside of their maximum power or thermal limits.

      Useful devices they are, however impervious to all failure modes and abuse, they are not.

      The reason the outage was about 20 minutes is because that's how long it takes to get the right "engineer" to go and flip the bypass switch. After that, they will then go and shout at the vendor and get them to fix the broken bit.

      We;ll never hear when they flip the switch a second time to bring it back on-line.

      Lots of things are probably running without UPS support tonight.

      BTW - there is a reason that people use global traffic managers and geo-diversity when designing fully resilient services

  18. dotty

    Short break ?

    We are still off last night BTWholesale was reading 0.00 download speed but the connection was working slowly now it's worse no doubt will improve today

  19. David Gosnell

    Again?

    Looks like something not entirely dissimilar this morning. Same vague "we're looking into it" type announcements from Plusnet, as multiple key sites respond slowly or not at all.

    Service: Broadband

    Posted: Thu, Jul 21 2016 at 09:04:19

    Subject: Broadband issues - NEW

    Sorry if you're unable to access some websites this morning, we're investigating the cause and will post an update shortly.

    Kind Regards,

    Customer Support

    1. localzuk

      Re: Again?

      Yup. Can't connect to any OVH servers myself. Which is annoying to say the least.

  20. Chris Redpath

    Seems to be completely borked again

    Very irritating, I am supposed to be working from home today.

    [edit] it seems to be mostly down and occasionally up but very slow. As an example

    traceroute to www.anandtech.com (192.65.241.100), 64 hops max

    1 192.168.1.254 (192.168.1.254) 0.872ms 0.869ms 0.804ms

    2 * * *

    3 * * *

    4 31.55.186.176 (31.55.186.176) 9.935ms 10.524ms 9.932ms

    5 195.99.127.206 (core4-hu0-16-0-3.faraday.ukcore.bt.net) 10.964ms 10.271ms 10.797ms

    6 62.6.201.146 (62.6.201.146) 9.870ms 11.850ms 11.707ms

    7 213.137.183.38 (213.137.183.38) 11.852ms 10.300ms 10.409ms

    8 166.49.208.91 (t2c3-xe-2-1-0-0.nl-ams2.eu.bt.net) 16.396ms 15.946ms 15.877ms

    9 166.49.237.11 (t2c4-xe-3-0-0-1.nl-ams2.eu.bt.net) 16.412ms 16.202ms 16.369ms

    10 130.117.14.141 (be5400.ccr21.ams04.atlas.cogentco.com) 114.400ms 105.857ms 100.322ms

    11 154.54.74.89 (be2311.ccr41.ams03.atlas.cogentco.com) 96.980ms 98.559ms 103.053ms

    12 * * *

    13 * * *

    14 * * 154.54.30.205 (be2090.ccr21.yyz02.atlas.cogentco.com) 242.593ms

    15 * * 154.54.31.225 (be2993.ccr21.cle04.atlas.cogentco.com) 242.326ms

    16 * * *

    17 * * *

    18 154.54.3.133 (be2432.ccr21.dfw01.atlas.cogentco.com) 225.384ms 214.409ms 217.014ms

    19 66.28.4.18 (be2938.rcr21.dfw04.atlas.cogentco.com) 225.262ms 232.295ms 221.849ms

    20 154.24.9.206 (te0-0-2-0.nr11.b028597-0.dfw04.atlas.cogentco.com) 212.502ms 222.810ms 231.794ms

    21 38.140.236.58 (38.140.236.58) 227.336ms 227.402ms 227.259ms

    22 199.231.224.61 (199.231.224.61) 220.000ms 215.584ms 220.215ms

    23 192.65.241.100 (www.anandtech.com) 225.495ms 233.592ms 233.821ms

    Look at those ping times..

  21. Old Man Tom

    Oh look...

    It's broken again

  22. Anonymous Coward
    Anonymous Coward

    Down again

    Looks like BT/Equinic cocked up again. Multiple locations are not accessible from BT again.

  23. Anonymous Coward
    Anonymous Coward

    Another morning of routing/DNS issues and outages...

    So much fun trying to run a business and busy office using backup 4g for VOIP and all internet traffic.

    Sigh.

  24. Anonymous Coward
    Anonymous Coward

    More BT flail today :-(

    Seems yesterdays network wobble is still causing BT issues this morning as well.

    What's todays excuse for broken internets?

  25. Scaffa

    "Hello,

    Due to a power incident at Telehouse North (London) IP Exchange Direct Access has been severely impacted since 07.45 this morning. Engineers are currently onsite attempting to resolve the issue.

    IP Exchange COE, BT Wholesale"

  26. markgobell

    ISP outages continued this morning

    My ISP is PlusNet part of BT

    The first outage occurred Tuesday evening - intermittent.

    This continued through to Wednesday early PM

    Today Thursday AM certain websites were still unavailable.

    PlusNet has a service status page indicating the fault and in online support chat today, which I couldn't access yesterday was with a power supply at the Equinix centre in London.

    http://www.plus.net/supportpages.html?a=2&support_action=messages&ispservice_id=adsldial

    This morning they claimed the same fault has returned !

    I could not even access emails from a local host !

    Surely this was a DNS issue - some sites available and others not ?

    Very frustrating.

    MG

  27. Sir Sham Cad

    Impact = 10%

    Absolute bollocks.

    Any national infrastructure that relied, at any point, on that peer, was banjaxed. I walked into a shitstorm at work until it was confirmed as a nationwide issue as the NHS WAN sits over BT Infrastructure. It was still flaky at 1pm and now I have had confirmation that there were ongoing effects that knackered broadband connections, including my own Virgin Media FTTC, by the time I gave up on it at 10:30pm.

    My VM connection is stable again today and, touch wood, so far this outage hasn't affected us at work.

  28. toffer99

    One more once?

    Hey BT, this is the second day. Is it going to be a regular morning event?

  29. Archtech Silver badge

    "This impacted a limited number of customers..."

    A nice use of the weasel word "limited". Of course all numbers are limited, except for those that are infinite. I rather doubt whether BT has ever had even Aleph-Null customers, let alone any larger transfinite number.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019