back to article Pickaxe chops cable, KOs UKFast data centre

Manchester workmen were blamed for knocking out hosting business UKFast's data centre today, after it seems some hapless bod cut a cable with a pickaxe. According to the Brit firm's status page, the problem arose this morning at 11am, affecting its MANOC 5 data centre. The incoming mains supply was lost to the site and …

  1. Hans Neeson-Bumpsadese Silver badge

    When a server loses power, even for a split second, it can damage the hard drive.

    True, very true...that's why you fit a UPS to bridge the gap between mains failure and backup generation.

    1. Chris Neale

      All just waffle and distraction from "we have single points of failure and either no DR plan, or one which we never test and just hope it works!"

      1. Doctor Syntax Silver badge

        either no DR plan, or one which we never test just tested

    2. TheVogon Silver badge

      Just install Bloom Energy servers directly fed from the gas main. No need for a UPS or generators.

      1. Phil O'Sophical Silver badge

        directly fed from the gas main.

        Gas mains are no more impervious to pickaxes than power cables are.

        1. Doctor Syntax Silver badge

          "Gas mains are no more impervious to pickaxes than power cables are."

          In both cases the results are - err - illuminating but gas illuminates for longer.

        2. TheVogon Silver badge

          "Gas mains are no more impervious to pickaxes than power cables are."

          But you are very very unlikely to loose both at the same time. They are entirely diverse services. The gas only needs to be there when the electricity isn't. As I would have thought was obvious.

          Hence why companies like eBay, Google, FedEx, Bank of America, Walmart, Coca Cola, etc. etc. use Bloom Energy servers instead of wasting power via online UPS conversion and keeping batteries charged.

          1. AndyD 8-)₹

            @Bloom Energy

            Jonathan Fahey in Forbes wrote: "Are we really falling for this again? Every clean tech company on the planet says it can produce clean energy cheaply, yet not a single one can. Government subsidies or mandates keep the entire worldwide industry afloat...."

            1. Anonymous Coward
              Anonymous Coward

              Re: @Bloom Energy

              Every clean tech company on the planet says it can produce clean energy cheaply,

              There's nothing especially "clean" about Bloom, it's just a fuel cell that runs on gas, conceptually the same as a standard gas or diesel genset. Produces 735-849 lbs of CO2 output per MWh.

              1. Alan Brown Silver badge

                Re: @Bloom Energy

                These would have a nice application as domestic heating cogen units.... :)

              2. This post has been deleted by its author

              3. Anonymous Coward
                Anonymous Coward

                Re: @Bloom Energy

                >>There's nothing especially "clean" about Bloom, it's just a fuel cell that runs on gas, conceptually the same as a standard gas or diesel genset."

                Well there is - fuel cells emit far less CO2 than a typical gas generation plant and near zero of the other common related pollutants such as NOx, SOx and VOCs

                >> Produces 735-849 lbs of CO2 output per MWh.

                As opposed to typical values of 2,117 lbs/MWh for current coal generation and 1,314 lbs/MWh for existing gas power plants - so a much lower CO2 output for fuel cells.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: @Bloom Energy

                  As opposed to typical values of 2,117 lbs/MWh for current coal generation and 1,314 lbs/MWh for existing gas power plants

                  And around 600 for modern cogeneration (CHP) sets which are popular for local off-grid supply.

                  Bloom isn't dirty by any means, but it's not especially clean either.

          2. Anonymous Coward
            Anonymous Coward

            Bloom Energy servers instead of wasting power via online UPS conversion and keeping batteries charged.

            Bloom Energy units are just specialised fuel cells, effectively solid-state generators. It's just local generation, with the grid as backup, nothing especially new in that and as environmentally unfriendly as a standard gas generator. You don't need a UPS because you assume the grid is "always on" so you have no need for a UPS to fill in the generator start time should your primary source fail. If a backhoe should take out both gas and electricity conduits you're screwed.

            1. mistersaxon

              ...and so is the backhoe driver I imagine.

              (spectacularly so)

            2. This post has been deleted by its author

              1. Anonymous Coward
                Anonymous Coward

                Standard gas generators emit lots of CO2 and are submit to grid and transformer losses. Fuel cells emit hardly any CO2 and the primary output is water vapour (and electricity!)

                That's simply not possible. You're consuming a hydrocarbon in the presence of oxygen, it doesn't matter if it's a hot or cold combustion, the chemical products will be the same. See the data sheets on the Bloom website for the CO2 figures. Only a hydrogen-fed fuel cell will produce water alone.

                Also there'll be no grid or transformer losses from a local gas generator, which will not be connected to the grid.

            3. TheVogon Silver badge

              "and as environmentally unfriendly as a standard gas generator"

              No, standard gas generator plants emit about double the CO2, various other pollutants and are submit to grid and transformer losses. Bloom Energy fuel cells emit about half the typical CO2 output of a gas generator plant and a quarter of a typical coal plant.

              "If a backhoe should take out both gas and electricity conduits you're screwed."

              You could say the same if someone took out all the fibre connectivity to your data centre. Which is why diverse services are fed to your data centre some distance apart via concrete lined trenches. Usually with dual feeds too. If you look at the rather impressive customer list above, clearly it works.

              Microsoft - generally acknowledged as one of the world leaders in datacentres - are putting these directly into racks:

              http://www.datacenterknowledge.com/design/microsoft-launches-pilot-natural-gas-powered-data-center-seattle

          3. Wayland Bronze badge

            I want a Bloom Energy Server at home now :D

        3. trapper

          Not Really

          Having worked on gas mains, leaking and otherwise in a previous incarnation, I can assure you that only a very ancient, rusted-out one is vulnerable to a pickaxe. But a trenching machine...or a backhoe...that's different. BTW, I once helped drive a 1 1/2" gas pipe squarely through an underground telephone cable, knocking out service to portions of two adjacent states. It was fascinating to see service trucks and company carsful of engineers screech in from all points of the compass, form a seething knot and bicker about who was entitled to scream at whom.

          1. TheVogon Silver badge

            Re: Not Really

            "I can assure you that only a very ancient, rusted-out one is vulnerable to a pickaxe"

            These days they are extremely heavy duty plastic. And yes I think you would be unlikely to get a pick axe through one accidentally.

            1. Anonymous Coward
              Anonymous Coward

              Re: Not Really

              I have to admit ... I smell a rat here.

              Pick axe, spike ... through a main ENW cable ... very unlikely. I'd like to see a 3rd party corroborate this story bearing in mind the BS us customers had to put up with on the day :/

        4. CrazyOldCatMan Silver badge

          Gas mains are no more impervious to pickaxes than power cables are.

          And tend to have somewhat more... 'interesting' failure modes..

    3. Lee D Silver badge

      No UPS can be guaranteed to function through a short-circuit or other dangerous situation (e.g. phase crossing).

      However, a datacentre uses UPS only as a brief stopgap, and the slightest delay in starting up the generators will mean dead batteries and a power blip inside.

      But "UPS" don't provide "uninterruptible" power. They just provide a backup, like any other. When a dangerous situation exists, even a high-end UPS will cut out for safety. Yes, I've seen them do it. In one case, a phase-crossing accident would literally hard-power-off the UPS instantly without warning or beeping or anything - just a single red light. Just bang, down, wait for power to return to normal. UPS was doing its job, before, during and after.

      A pickaxe through a cable is exactly the kind of thing that can bridge the live and earth, or multiple phases for instance, and UPS can't completely isolate the inside from the outside.

      1. Phil O'Sophical Silver badge

        When a dangerous situation exists, even a high-end UPS will cut out for safety

        Which is why there's a distinction between High Availability (the UPS + generator) and Disaster Recovery (the second site with the hot standby on completely different power circuits). Each protects against a diferent type of fault.

      2. J. Cook Silver badge
        Mushroom

        UPS and power shenanigans...

        I can tell you for a solid fact that APC UPS units get mighty peeved when they see 220 on the _ground line_. In that incident, the UPS went into full isolation and shut down hard, which protected the server that was on it from getting it's hardware blown up. (unlike two of the brand new workstations at that site, which decided to set their power supplies on fire as the electrolytic caps blew out.) Thankfully, the server passed it's fsck and carried on after everything was brought back to normal.

        (mushroom cloud icon, because it's not every day one sees flames jetting out the back of a brand new computer's power brick.)

        1. Roland6 Silver badge

          Re: UPS and power shenanigans...

          >I can tell you for a solid fact that APC UPS units get mighty peeved when they see 220 on the _ground line_.

          What, it's 2017 and this is still a problem! :)

          Back in the late 1970's I worked for a company that installed IT systems next to railway lines ie. signalling systems. Where fluctuating voltages on the ground line were regular events (ie. every time a train went by). So the company had developed some rather fancy switch gear that sorted the problem. The other problem (to delicate IT systems) we saw in the early 80's were the power spikes caused by the then newly introduced thyristor controlled systems, these were particularly troublesome as they were invisible to the then new digital scopes but not to the analogue scopes.

          1. J. Cook Silver badge

            Re: UPS and power shenanigans...

            @Roland6: It's usually only a problem when it's intentionally done by an electrician who got the bill for the outage they caused to the businesses in that office complex, *and* our bill for the replacement of hardware, technician labor, call out, etc. :)

            If I recall correctly, said electrician shorted the 220V line in to the something- now that I'm thinking, it might have been the neutral line and not the ground line. (US uses hot, neutral, and earth ground for most things) In any case, two of the workstations did not like whatever they did and the capacitors in their power supplies blew up rather messily. the UPS did exactly what it was supposed to do- isolated the load entirely, then shut down.

            It was fun walking into the shop in the mornings and smelling freshly cooked power supplies.

        2. Alan Brown Silver badge

          Re: UPS and power shenanigans...

          "when they see 220 on the _ground line_"

          If you have decent input and output protection this "should" never happen.

          You can get away with assuming it's all fine 99% of the time, but it's that 1% that gets you - and in a lot of cases the UPS power supply systems are overcautious about shutting down under conditions that the mains keeps going under. Lots of arguments between power people and data centre people revolve around what's acceptable.

      3. CheesyTheClown Silver badge

        Not entirely true

        I worked as an engineer developing the circuitry and switching components for UPS systems running the safety systems at two nuclear facilities in the U.S.. these systems delivered 360v at 375amps uninterrupted

        Rule #1 : Four independent UPS systems

        Rule #2 : Two UPS off grid powering the safety systems. One at 100% drain, one at 50% drain

        Rule #3 : One UPS being discharged in a controlled manor to level battery life and identify cell defects

        Rule #4 : Recharge the drained battery

        Rule #5 : Fourth UPS drain and recharge separately

        Rule #6 : Two diesel generators off grid

        This system may not guarantee 100%, but it is far better than five-9’s. There can absolute catastrophic failure on the supplying grid and it does not impact the systems even one bit. This is because the systems are never actually connected to the grid. And before you come back with issues or waste related to transference, the cost benefits far outweigh the losses because the life span of 90% the cells are extended from four years to an additional 3-5 years by properly managing them in this fashion. And the power lost at this level is far less expensive than replacing the cells twice as often.

        P.S. before you call bullshit, there was extensive (corroborated) research at Univeristy of South Florida over a period of 15 years on this one topic.

        1. Wayland Bronze badge

          Re: Not entirely true

          It would be nice if this was scaled down into APC UPS. Those things cook batteries until after 3 years they are no good. Compare that to the life of a car battery, those last 10 years and do a lot more work.

          1. Anonymous Coward
            Anonymous Coward

            Re: Not entirely true

            "that to the life of a car battery, those last 10 years "

            5 years would be more normal for a regularly used vehicle imo!

          2. Alan Brown Silver badge

            Re: Not entirely true

            "Those things cook batteries until after 3 years they are no good. "

            A lot of that has to do with the discharge cycles they see. Lead Acid batteries DO NOT like being discharged and the deeper the discharge the fewer cycles they'll endure.

            AGM batteries are compact and don't gas off, but that's their only advantage. If you want cells that last for decades, then use traction batteries or a string of flooded deep discharge telco cells like these exide ones: https://www.exide.com/en/product/flooded-classic-ncn (those are the nuke type, read the PDF to see the selection choices you have)

            You'll need 24 of them.

            Even at end of life they're impressive. A certain exchange engineer strapped 12 old satellite exchange ones into the back of an original Fiat Bambino, replaced the engine with an electric motor and would commute the 5 miles from home to work for a week on a single charge - he did this for 20 years (until he retired) and there was no noticeable degradation in range.

            1. J. Cook Silver badge

              Re: Not entirely true

              AGM's also don't generally like being charged all the time, either- that's what usually kills the battery packs on the APC units. Generally what happens is one or more of the batteries in the pack get tired of dealing with the overcharge and go dead open, at which point the pack stops charging entirely, and you lose protection entirely. Given the prices that APC charges for replacement battery packs, I'm more fond of buying a set of replacement batteries and re-building the packs- the downside to that is that you void the connected equipment coverage by doing that. I'm not quite certain what the big 3-phase Emerson-Liebert beasts we use at RedactedCo use, but I don't have to worry about it because we have them on a maintenance contract, and the company we are using is quite reliable..

        2. Roland6 Silver badge

          Re: Not entirely true

          P.S. before you call bullshit, there was extensive (corroborated) research at Univeristy of South Florida over a period of 15 years on this one topic.

          Be interested in a reference/pointer to the research, as suspect because the end result is fewer battery sales, this isn't something many in the vendor community would want to be widely known about.

          1. Alan Brown Silver badge

            Re: Not entirely true

            If you want to be kind to your batteries, then a flywheel kinetic system does a lot to reduce their starting stress and in many cases is large enough to eliminate them entirely. It depends on how much you want to trade off engine run times.

        3. Wrekt__IT

          Re: Not entirely true

          But then you have to factor in ... building to a budget and scrimping on costs to create something that will maximise profits ....

    4. Anonymous Coward
      Anonymous Coward

      UPS's don't last forever, "The incoming mains supply was lost to the site and generators failed to take over the service." sounds more like the generators failed to pick up from the UPS.

      1. Microchip

        According to the emails they've been sending out, the incoming supply was unstable, and the gennies failed to sync up properly, and the UPSs ran out of power before the generators could kick in safely.

    5. Alan Brown Silver badge

      UPS

      At these scales you fit the UPS to the BUILDING.

      1000kVA Flywheel systems don't fit in server racks. Nor would you want them there.

    6. CrazyOldCatMan Silver badge
      Joke

      why you fit a UPS to bridge the gap

      I suspect in their case they were using Hermes to send the power rather than UPS..

    7. Chris Walsh

      Just had our UK fast account manager call us. Apparently the resulting power surge caused the UPS to fail which is why the outage occurred. They are looking into hardening this area.

    8. Anonymous Coward
      Anonymous Coward

      "True, very true..."

      So how does the drive suffer damage then? I assume you are talking about physical damage here. I've never seen a hard drive damaged by a power failure; data corruption yes but actual physical damage, no. Aren't they designed to auto-park the head when the power trips? Maybe it's different in large data centres with 000's of servers. Please enlighten me :)

    9. Neiljohnuk

      And hope the UPS is properly installed and maintained so it works when needed and doesn't fail to deliver, or worse cause a multi-million pound fire, like one I know of, £175,000,000 claim and more not covered...

  2. Chris Neale

    DR Testing Failure

    "generators failed to take over the service"

    You can blame all the diggers, pickaxes or rat's gnawing fibre.

    There's not point having backups if they don't work.

    RCA Action #1 Do DR tests regularly

    1. Ledswinger Silver badge

      Re: DR Testing Failure

      I'll wager the components of a recovery plan were all documented and tested, including physical test runs of the gensets. I doubt they did a full "turn off the mains power" test, but if they had, they'd have been in the same position (sitting in a dark data centre, thinking "shit!").

      The other possibility is that they have turned off the mains in tests, and everything went perfectly. That's a known problem with standby power - it only works most of the time. And on that subset of times when it doesn't work, you usually need it and everybody notices.

      A question for the DR professionals: What is the ACTUAL failure rate of a completely successful, fully automatic handover from interrupted mains to on-site generators? My guess is nobody does it often enough to know.

      1. Alan Brown Silver badge

        Re: DR Testing Failure

        "What is the ACTUAL failure rate of a completely successful, fully automatic handover from interrupted mains to on-site generators? "

        We get around 400-600 power breaks per year (rotten power feeds in the surrey countryside). We've had about 5 unplanned outages in the last decade. That's with a flywheel kinetic system backed by diesel generators and at least one of those was due to the generator starter motor battery being dead. Most of the time the flywheel rides out the break and the gensets only start at the 10 second mark,

    2. Alister Silver badge

      Re: DR Testing Failure

      Exactly.

      The article very carefully skims over that bit, doesn't it.

      The incoming mains supply was lost to the site and generators failed to take over the service.

      Every datacenter I've ever dealt with does weekly on-load generator tests, and UPS failover tests.

      Now we all know shit happens, no matter how much we try to prepare, but this does feel like they haven't been taking enough time on planning or testing.

      Why didn't their UPS have enough capacity to keep things up, even if the generators failed to start cleanly?

      1. Lysenko

        Re: DR Testing Failure

        Every datacenter I've ever dealt with does weekly on-load generator tests, and UPS failover tests.

        None of which tells you that you're safe against a breaker cascade as the whole A load switches to B and idling PSUs in blade chassis reactivate etc. There is no substitute for randomly[1] flicking breakers, PSUs and HVACs on a routine basis to verify TIII resiliency and DR will work as required. Unfortunately, that requires a degree of testicular fortitude entirely absent in facilities staff (other than perhaps those actually angling for a P45) and so this sort of thing keeps happening.

        [1] It has to be random otherwise Ops will shift loads to other infrastructure to protect their uptime metrics and thus invalidating the results. Idling equipment draws less power.

        1. Chris Miller

          Re: DR Testing Failure

          Such testing is what I (as a conscientious consultant) recommend to my clients. But I also tell them: "If you ever have a real disaster (fire, flood etc) and 80% of services carry on working, you'll be a hero. If you do a disaster test and 98% of services carry on working, start looking for a new job."

        2. Wayland Bronze badge

          Re: DR Testing Failure

          "There is no substitute for randomly[1] flicking breakers, PSUs and HVACs on a routine basis"

          Break the thing on purpose so you're better at fixing it. That could annoy a lot of people but they to will get used to coping when it breaks so will be less affected when it breaks for real. It's only reliable things which cause a massive problem when one day they break.

      2. jmch Silver badge

        Re: DR Testing Failure

        "Why didn't their UPS have enough capacity to keep things up, even if the generators failed to start cleanly?"

        Doesn't matter how big your batteries are, if the generators don't work the batteries will eventually run out.

        1. Alister Silver badge

          Re: DR Testing Failure

          @jmch

          Doesn't matter how big your batteries are, if the generators don't work the batteries will eventually run out.

          No, that's obviously true, but my reading of this situation is that the mains power went out and everything stopped immediately. They should have had sufficient battery power to at least give them time to manually start the backup generators, but that doesn't appear to have happened.

        2. Anonymous Coward
          Anonymous Coward

          Re: DR Testing Failure

          @jmch

          That's true but the UPS is connected to the server so when battery level = x it shuts down safely. I remember years ago setting up the scripts on the APC ones in Linux. I'm sure these days it's all automatic with a nice flashy web console.

          1. SImon Hobson Silver badge

            Re: DR Testing Failure

            but the UPS is connected to the server so when battery level = x it shuts down safely

            I bet it isn't in any large datacentre - with tens of thousands of servers it's just going to be a big hassle and create problems of it's own (false alarms causing shutdowns). Instead, they work on the basis of having UPSs sized to cover the gap till the gennys start up - and gennys to take over before the batteries run out. In principle, there should never be a need for low UPS battery to shut down the servers. Apart from these loss of mains events, most other faults won't give you any warning before the server loses power.

            1. Anonymous Coward
              Anonymous Coward

              Re: DR Testing Failure

              @Simon Hobson

              That's a good point though If I had my own server in a datacentre I'd put my own UPS on it if was feasible.

        3. lancashirelad

          Re: DR Testing Failure

          We saw a small temperature wobble for the 20 mins prior to the failure. I wonder if they were on UPS ?

          1. Lysenko

            Re: DR Testing Failure

            We saw a small temperature wobble for the 20 mins prior to the failure. I wonder if they were on UPS ?

            That isn't uncommon. The servers are on UPS but the CRAC units aren't so temperature trends upwards until the generators kick in. That's the theory, anyway. I've never known anyone actually perform a controlled test to see what happens if the generators fail - does the UPS power run out before thermal overrun cuts in or not?

            I have seen it happen for real though. The "cold aisle" input temperature hit about 80 degrees before the monitoring equipment comms failed. God knows what the peak was (dark site - no humans present).

            1. Stoneshop Silver badge

              Re: DR Testing Failure

              That isn't uncommon. The servers are on UPS but the CRAC units aren't so temperature trends upwards until the generators kick in.

              Well, I guess that would be restricted to a brief temperature bump if the generators took over in a minute, maybe two, but I've seen the thermograph register 10 degrees rise in as many minutes in a computer room when the AC konked out, in a room way less loaded than a modern datacentre.

              If the UPS is sized for the few minutes during which the generators are supposed to get their act together you might have maybe 30 degrees rise before the power fails, so not yet tripping overtemp safeties.

              1. Lysenko

                Re: DR Testing Failure

                If the UPS is sized for the few minutes during which the generators are supposed to get their act together you might have maybe 30 degrees rise before the power fails, so not yet tripping overtemp safeties.

                The problem with that is that the UPS sizing is always based on full DC capacity whereas a cold aisle containment unit is a semi-sealed micro-climate. That means that in a part filled DC it is possible for a given CAC unit to keep on trucking for several times longer than the design calculations suggest and ambient cooling doesn't help because the CAC is semi-sealed. In the case I referred to above, the equipment was still operating on UPS at least half an hour after HVAC went down.

              2. ricardian

                Re: DR Testing Failure

                Why do I keep reading "the UPS is sized" as "the UPS is siezed"?

    3. Anonymous Coward
      Anonymous Coward

      Re: DR Testing Failure

      It does remind me of a time, long ago, when I worked for a large, international, ummm, lets just say MoD supplier.

      One weekend they decided to test the generator failover, they tripped the trigger and the two generators sprang into life.

      For some reason, which I never understood, they decided to give the generators a load, namely the site mainframe... Which they did, and things were ok for a while.

      Then generator #2 died, so they all ran off to investigate - nobody thought about putting the mainframe back to the grid feed at this point - which was a pity... as while they were all distracted by genny #2, #1 ran out of fuel.

      When we all came in on Monday, they were *still* trying to get the mainframe to boot.

    4. beddo

      Re: DR Testing Failure

      UK Fast provides Colo services.

      You can bet that most of the affected Colo customers haven't invested in the infrastructure to have their gear distributed across two data centres. It is not something anyone expects to get from Colo unless they pay for it - double costs pus load balancers etc.

      You can bet your arse that anyone with that setup stayed online but attacking the DC for the those customers that didn't is daft.

      The only thing I'd want to know is why the generators didn't kick in. Our DC runs full load tests but sods law states that something will go wrong right when you need it.

      1. Anonymous Coward
        Anonymous Coward

        Re: DR Testing Failure

        As most of the Colo customers are ISO 9001 I guess they'll be doing 3rd party audits to see evidence of maintenance, regular testing etc ....

  3. Anonymous Coward
    Anonymous Coward

    And how is the pickaxe-wielding person?

    Fried?

    1. Voland's right hand Silver badge

      Re: And how is the pickaxe-wielding person?

      My exact thought.

      Going through a cable sufficient to feed a datacenter with a pick-axe is the equivalent of pulling the ring on a hand grenade while sitting on it.

      1. Phil O'Sophical Silver badge

        Re: And how is the pickaxe-wielding person?

        Any mains cable of that capacity should have had a warning mesh a good 50cm or more above it. I can see a JCB ploughing right on through, but surely a pickaxe-wielder would have noticed it on the way down to his brown-trouser moment?

  4. Anonymous Coward
    Anonymous Coward

    Good old metrolink building the tram to the Trafford centre, if anyone has a visit between now and Christmas (why would you though?) please marvel at the temporary "roundabout" at event city. It's a sheer work of art, no doubt the result of weeks of architects drinking special brew and trying to draw just one round circle then giving up and throwing the lane rulebook for roundabouts out of the window while shouting incoherent drunk instructions to the workers. This "accident" does not surprise me.

    1. TRT Silver badge

      I'm firmly of the opinion...

      that most of these road planning disasters are the result of a decision taken, must have been around 1991, to remove the ban on bringing coffee mugs into the planning room where all the blue-prints are laid out. It's the only explanation I can think of for the bizarre appearance, over the last three decades, of tens of thousands of roundabouts with no additional roads connecting into them, offset from the natural flow of the traffic by about two lanes width, overlapping with each other or just plain badly built - planners putting their coffee mugs down on top of the road planning documents.

      1. Anonymous Coward
        Anonymous Coward

        Re: I'm firmly of the opinion...

        The one I'm talking about must have been a novelty mug.

      2. Anonymous Coward
        Anonymous Coward

        Re: I'm firmly of the opinion...

        Thousands of Roundabouts?

        How about the sudden rise of Traffic Lights operating 24/7 when their might only be a decent amount of traffic for 2-3 hours a day.

        They are obviously just used to slow ALL traffic down and increase pollution due to more idling.

        Traffic lights V2019 will all come with red-light cameras. All you habitual red light jumpers (and that includes Busses and HGV's) will need to beware.

        We, the motorist are a big fat jucy target who can't easily fight back.

      3. Mark 85 Silver badge

        Re: I'm firmly of the opinion...

        ....must have been around 1991, to remove the ban on bringing coffee mugs into the planning room

        One has to wonder if that was actually coffee or tea in those mugs than or something with a bit of alcohol content. I've worked with a few like that. Never, ever invited them to a meeting when their coffee/tea mug was less than half full.

  5. Anonymous Coward
    Anonymous Coward

    Still issues

    We got alerted that our servers had gone down at around 1040 this morning - they came back online again at some time around 1130.

    First issue is that the backup generators did not kick in at all - no idea why not but there should never be downtime due to a power issue whether that is because the generators take over or because you have more than one input feed for power.

    Now, 3 hours later, whilst the servers are all online they are as laggy as hell - we suspect the network is to blame but anything that generates traffic to our database servers is really slow whilst the servers themselves are fine. Will see how the next few hours pans out.

    1. hopkinse

      Re: Still issues

      Lucky you - ours are still down/incommunicado 3+ hours later!!!

      1. hopkinse

        Re: Still issues

        5 hours and counting....

    2. Snuggle

      Re: Still issues

      At least your server is up.... nearly 5 hours and counting and we still cannot SSH to the damn thing let alone load any of the sites which should be handling 100's of orders for Christmas :|

      1. hopkinse

        Re: Still issues

        was past 22:00 before ours came back to life. Time for some contract re-negotiation!

        1. Anonymous Coward
          Anonymous Coward

          Re: Still issues

          I take it you are one of the 99% of customers who thanked them?!

          https://www.channelweb.co.uk/crn-uk/news/3023021/ninety-nine-per-cent-of-customers-have-thanked-us-ukfast-ceo-jones

          What a joke .... Oh thank you for 10 hours downtime!

  6. Lysenko

    How bizarre

    MANOC 5 is supposed to be a Tier III facility which means (and I quote):

    "N+1 fault tolerant providing at least 72 hour power outage protection"

    If a single HVAC failure took out the DC that means they're either lying or else the secondary and tertiary power supplies plus the UPS and generators were all simultaneously both defective and not known to be defective. Even with negligence and incompetence bordering on deliberate sabotage, I don't find the latter option credible.

    1. Anonymous Coward
      Anonymous Coward

      Re: How bizarre

      I think the only fault tolerance was a second extension cable but it did have surge protection.

  7. vivien_vcloud

    UPS

    I know this is pretty far-fetched, but maybe they could implement some kind of system where there is a battery between the mains power and the datacentre infrastructure. If the special battery system becomes active, they could have some sort of power generator that takes over in a seamless manner.

    1. whitepines Bronze badge

      Re: UPS

      This is already a thing. Our datacenter has this using an off the shelf facility-wide unit, and we're not even Tier III. This was a display of sheer incompetence, nothing more.

      Also, wonder what happened to the worker. If the line was 13.8kV all that's left is a plasma burn, but something lower voltage might have been survivable(ish).....

      EDIT: On second reading I missed the sarcasm :) That being said, a lot of folks don't know the difference between true double conversion and a "standard" UPS, I'd hope these operators weren't stupid enough to use single conversion units at the racks. Then again, since they're mentioning hard drives in the individual nodes, who knows....

  8. EastFinchleyite

    Would you?

    I really hope that the people who supplied and fitted the UPS do not get involved in commercial air travel because I, for one, would not be willing to get on any airliner they had anything to do with.

    1. Doctor Syntax Silver badge

      Re: Would you?

      "I really hope that the people who supplied and fitted the UPS do not get involved in commercial air travel"

      You mean something like, say, a BA data centre?

      1. EastFinchleyite

        Re: Would you?

        I think that if the BA data centre had been run by managers schooled in the standards of reliability applied to airliners then their problems would not have happened. If my experience is anything to go by, It is most likely that they would have liked to do a better job but the bean counters had other ideas.

    2. Alan Brown Silver badge

      Re: Would you?

      It's possible to make a robust, durable UPS setup for datacentres.

      Until management decide it costs too much and shitcan it.

      Bean counters have more sense than to cut corners in these area when you explain the consequences of fucking up. Management will let you talk and do it anyway.

  9. Anonymous Coward
    Anonymous Coward

    Point of order

    The line *in the song* is actually "Heigh-ho, Heigh-ho, its home from work we go", and in the official Disney lyrics for it the line "off to work we go" isn't even mentioned.

    There's a bit more to it than that, but it's always a great argument starter down the pub.

  10. IGnatius T Foobar

    N+1 redundancy

    Apparently this data center didn't have enough redundancy. A proper carrier-class data center is going to be fed from more than one mains supply, entering the building in different locations. Those mains supplies will then lead to separate UPS plants, separate PDU's, and finally to each critical piece of IT gear via A/B power supplies in each rack.

    Any data center that has a single point of failure *anywhere* is not a data center in which one should run mission-critical workloads.

    1. Ledswinger Silver badge

      Re: N+1 redundancy

      A proper carrier-class data center is going to be fed from more than one mains supply, entering the building in different locations. Those mains supplies will then lead to separate UPS plants, separate PDU's, and finally to each critical piece of IT gear via A/B power supplies in each rack.

      Which is probably about N+3.

      N+1 simply means the site has one independent backup, in the form of the UPS and gensets. It didn't work, but up until this morning everybody hoped that it did, and that "hope" element is common to most disaster recovery and resilience plans, no matter how many Ns they claim to have.

  11. Anonymous Coward Silver badge
    Mushroom

    Normal really

    I've heard from a digger driver that they don't worry about things like digging through power cables or water mains - it slows them down too much and any fallout from damage they do is picked up by the insurance company, so why bother trying to avoid it?

    It's no wonder it's such a common occurrence.

    1. MOV r0,r0

      Re: Normal really

      I've heard from a digger driver that they don't worry about things like digging through power cables or water mains

      I heard that too, from a BT staffer. If you wrote to them weeks in advance they'd send a map saying where not to dig but if you put a JCB bucket right through they'd actually come out on site, same day too!

  12. Mutato

    What a joke

    A pickaxe magically stops their UPS and generators from working. Yeah ok...

    They are currently still working on "residual" issues. Which means their big clients are being ignored so that they can get the bulk of the smaller clients working. I think many people will be doing the same as us and changing provider.

    1. Doctor Syntax Silver badge

      Re: What a joke

      "Which means their big clients are being ignored so that they can get the bulk of the smaller clients working."

      It might be the other way around and you're not as big in their terms as you thought. It might also be that they're working alphabetically or even randomly.

  13. JaitcH
    FAIL

    Tesla Has a Battery to Prevent This!

    Tesla has a battery system to prevent this.

    Back in the day I worked on a navigation system transmitter site.

    We had duplicate mains feeds from different segments of the grid, a massive bank of batteries and a ONE-CYLINDER ENGINED GENERATOR. We technicians religiously maintained the back-up system, carefully checking each individual 2 volt glass-walled cell, recording the battery electrolyte levels, internal resistance (annually), etc.

    Once a week the station engineer would, without prior warning, disconnect the grid power to test the reserve power system. It never failed.

    I would put money on this incident that regular maintenance was not performed.

    1. TRT Silver badge

      Re: Tesla Has a Battery to Prevent This!

      Wow! I remember those glass cell battery backups from when I was a nipper - my mum's best mate worked on the hospital telephone exchange, and the room behind her station was packed full of them.

  14. lancashirelad

    The inlet temperatures on our server wobbled a few degrees for 20 mins before the outage

    You would almost think they must have known something was going on ?

    1. Anonymous Coward
      Anonymous Coward

      Re: The inlet temperatures on our server wobbled a few degrees for 20 mins before the outage

      The digital equivalent of cows panicking before an earthquake.

  15. Anonymous Coward
    Anonymous Coward

    Take a look at yourself

    Outages happen, simple as that regardless of who, you are, what teir you are, how big or small you are. Surely we've learnt that by now? There always appears to be shock and amazement when a DC or major cloud operator suffers an outage.

    So where is your DR plan? All eggs in one basket? You should be able to implement a pretty decent automated failover that kicks in within 5 minutes or at least a manual one within 30 mins.

    Perhaps you should look at your own DR plan whilst twiddling your fingers waiting for your server to come back up.

    In cab UPS (dual fed everything) is worth a thought if you are colocated, it at least means your servers can shut themselves down cleanly in the event of a power failure.

    Which gives me a thought, why don't DC's offer the facility for servers to know when they are on UPS and the UPS is running down, my kit at home (windows and linux) knows this and can shutdown automatically to prevent damge, so why can't kit at DC's?

    1. Mark 85 Silver badge

      Re: Take a look at yourself

      Well said but once this goes to manglement and someone looks at that cost.. well those eat into the profit. The plan then comes flying back with a "reduce cost" mandate and things really go to hell from there.

    2. TRT Silver badge

      Re: Take a look at yourself

      The DR plan is not so much "all eggs in one basket" as, we have a battery farm on standby which can bring the egg quota back to nominal levels within an acceptable timescale as detailed in the SLA. We just whip out the corks...

  16. Just the Tonic

    I don't care about all this techno babble.... I just think that after 12 hours my companies website and emails should be working. it is Xmas and it is totally and utterly out of order

    I will be asking my web hosters not to use these people again

    1. Anonymous Coward
      Anonymous Coward

      I'd be asking myself why my web hosters don't have a dr plan that isn't activated within an hour.... it will happen again regardless of who they use if they have no dr plan. Outages happen period, its what you do about it that matters.

    2. Anonymous Coward
      Anonymous Coward

      Project manager by any chance?

      1. Anonymous Coward
        Anonymous Coward

        No a self employed techie of many years experience who has a business which has an automated and a manual DR plan and triple if not quadruple everything. Was doing lights out computing, dr and automation before the internet was a thing.

  17. Russtavo

    Just got this email from ukfast

    Please find below an interim report following the power issue affecting our MaNOC5 facility data centres on Tuesday 12th December 2017.

    Please be aware that this is an interim report from the information we currently have available, we are waiting on further information from our Generator suppliers which will add to the final report when it's available.

    At 10:28 GMT on Tuesday 12th December 2017, our MANOC 5, 6 and 7 facilities were impacted by an instability on the incoming mains power as a result of a civil contractor passing a spike through the main feed. This was not work being carried out on behalf of UKFast or within our site but at another location 0.75km away on the path to our onsite transformer.

    The UPS system supported the load for its designed time and the generators started; however, due to the physical damage to the power cable, service to the site was unstable and intermittent. As a result, the generator set failed to synchronise and take over service.

    UKFast engineers on site were alerted to the generators being unable to take over the load , and that manual intervention was required. During this time the UPS batteries depleted past the designed runtime resulting in total power loss to equipment on site at 10:40 GMT.

    The manual synchronisation was completed at 10:48 GMT and the onsite generator set took over power, enabling us to start bringing power back on for client services.

    Our engineers have worked throughout the day to restore individual services and continue to do so for those clients who remain affected by the power issue. The process of bringing back all services in a facility requires more than just powering on equipment and coupled with the resulting failures in physical devices that have required replacement, means this is a lengthy process.

    Once we have resumed full service, we will be investigating what we can do to prevent this from happening again for both the incoming power issue and also the time taken to restore service for some of our clien ts.

    We will update this report as we get more information from our generator supplier and also from our teams to discuss the delays in resolving this issue.

    Kind regards,

    Charlotte

    Charlotte Bentley-Crane

    Account Management Director

    UKFast

    1. Captain DaFt

      Re: Just got this email from ukfast

      The UPS system supported the load for its designed time and the generators started; however, due to the physical damage to the power cable, service to the site was unstable and intermittent. As a result, the generator set failed to synchronise and take over service.

      Holy crap! You mean that their transfer switch didn't automatically cut the mains feed when the power went wobbly and back up kicked in?

      And not restore mains power until it's stable? (For at least 15 minutes, if I recall correctly)

      Or, just as bad, they had more than one generator operating, and no way to synchronise them?

      1. Anonymous Coward
        Anonymous Coward

        Re: Just got this email from ukfast

        ukfast =cowboys

        also thē 40000 servers they claim to have makes them a very small player in the market. no consolation for those caught out by this of course... but other more worthy providers are available

      2. Richard 12 Silver badge

        Re: Just got this email from ukfast

        Wait, what?

        If that statement is true, then they are about to get a very nasty visit from National Grid.

        The only way a break or short upstream of the local transformer/substation could affect generator sync is if they backfeed their generators into the incoming mains supply.

        Or possibly their PE arrangement is dangerously wrong.

        Both of which are illegal due to being literally deadly.

      3. Anonymous Coward
        Anonymous Coward

        Re: Just got this email from ukfast

        I mean, how much must they have scrimped on this build to have not got this right!!So the incoming feed is wobbly, the batteries take over while the gen sets kick in ... they have massive fuel tanks so why would they have worried .. I thought they were able to hold power for 3 days? I'm guessing someone from Sudlows is going to get a kicking for this!

        1. Anonymous Coward
          Anonymous Coward

          Re: Just got this email from ukfast

          It's UKfast... those extremely expensive service fees mostly get spent on a themed office, ski trips, friday beers amongst other gimmicks so there isn't any money to actually spend on building decent infrastructure.

  18. PaulusTheGrey
    Facepalm

    What's a Socomec? Can we sell it as a service?

    Bad maintenance or most likely insufficient maintenance due to lack of investment. This shouldn't be a thing these days but it still happens where someone in the organisation can't see what this piece of switchgear does so can't see the point of having a PMA. OK, you save the company maybe a grand a year and then get hot by service credits (or worse) and lose a six or seven digit sum as a fine or compensation claim.

  19. Pat Harkin
    Coat

    "When a server loses power, even for a split second, it can damage the hard drive."

    That's where UPS comes in. They can ship you a replacement drive for next day delivery.

  20. TrumpSlurp the Troll Silver badge
    Paris Hilton

    UPS?

    Don't they deliver parcels?

    1. Anonymous Coward
      Anonymous Coward

      Re: UPS?

      Yeah, that's the joke hence the "they can ship you a replacement".

  21. Anonymous Coward
    Anonymous Coward

    Puzzled

    Wait ... so the MD claims that it was by a pick axe but there's no evidence. I mean ... ENW have nothing about it on their website, no outages. If someone did hit a power cable with a pick axe ... I'm reckoning that'll be a toasty person - I wonder how many MVA they have going through that cable that this alleged workman hit ! With his super sharp pickage that got through the armour surrounding cable and into the main cluster conveniently bridging live and earth which all seems mighty convenient or unlucky

    Even if it was a power feed issue ... I thought they were using dual power feeds, which begs the question as to what happened to the other power feed.

    Finally we then have the issue of no power failover. If you host with them, I would be asking to see evidence of the failover tests from previously to see if they have had issues previously.

    Between this and the crazy peepshow dancing girls on stage - bad few days for UKFast. No doubt some inspirational blog will follow.

    1. Anonymous Coward
      Anonymous Coward

      Re: Puzzled

      Yup. Lies for sure.

      Like you said, if this was a pick axe, someone would be in a serious way / dead now. And that would make the news... even if it's just local.

      I'm local and I've heard nothing. ENW status site has shown nothing. UKFast screwed up!

  22. Cas_Paton

    UKFast took us offline for 11.5 hours!

    We were offline 11.5 hours because of this.

    We summarised our events here: https://www.linkedin.com/pulse/onbuy-offline-115-hours-due-ukfast-data-centre-failure-cas-paton/

    The funny part of this is that we called them at the height of the incident, as a mystery shopper - check it out !!

  23. Russtavo

    Update from UKFast

    Please find below an update on yesterday's issue from our Critical Power Director, explaining our return to mains power:

    We have been running successfully on our back up generators since 10.48am 12th December 2017.

    We have 12,250 litres of fuel which equates to a run time of 49 hours, with tankers on standby who are able to deliver with immediate effect.

    At this time we await confirmation of a time to re-energise the power network to MaNOC 5,6 & 7. The cable has been fixed by Electricity North West. We understand this will more than likely be later today.

    To return to mains power we will be doing this in a controlled manner. The UKFast data centre is currently locked onto generator power so when ENW switch the power back on we will prove it’s working perfectly before starting the process to switch over back to mains. This mitigates any risk of the power coming on intermittently or incorrectly.

    Once ENW confirm to UKFast the electric supply is energised we will then check the electricity supply is present at our transformer and check the electricity for phase rotation to ensure the supply is electrically correct. Even though ENW should have proved this themselves we will double check it before proceeding to the next step.

    Once we are happy that the supply is stable and correct we will activate our automatic power change-over system. The automatic change-over system will also monitor the power for 5 minutes and then initiate the automated return to mains power. This is called a proving period.

    At this point the generator's bus-coupler main breaker will open, removing generator power from the system. The UPS which is constantly in operation will automatically support the technical load for around 10 seconds while the mains electricity power circuit breaker is closed and reconnected to the power systems.

    The UPS battery system also supports the mains change-over for a further 2 minutes as the UPS slowly transfers from battery power to mains power. This is called a "walk in" procedure and removes the risk of the UPS seeing huge power demands and creates a smooth transition. There is no break in power to the technical load during the walk in period.

    During the walk in period the air-conditioning will stop for a moment and restart and also perform a walk in procedure to ensure no stress is placed on the power network. This takes around 1-2 minutes as the CRAC units stop and restart.

    During this period the generators continue to run should they be required. Once the transfer is complete the generators run on for a further 3 minutes then shut down and go back into standby mode.

    Return to mains power is then complete should there be any issue during the transfer period we can automatically switch back to generator power.

    On site managing the process we have the UKFast Electrical Team and Ingram Generator Service Partners.

    It's not uncommon to lose power in the UK. In the last 12 months we have performed this exercise twice after losing power to the area. Both times the systems have switched over perfectly. This is one of the reasons we run an N 1 environment.

    We prove the start signal on a weekly basis which fires up the generators and the UPS tests itself every day at 8am. We are the only data centre that we are aware of to hold the NICEIC accreditation meaning we are a fully licensed electrical contractor and can manage and maintain our data centres without the need for external contractors.

    Unfortunately we do not have an exact time when the supply will be reconnected, however we are on standby and working closely with ENW who indicate it will probably be late afternoon or early evening.

    Yours sincerely

    Miles Allen Critical Power Director

    1. AndyD 8-)₹

      Re: Update from UKFast

      New to the job but a fast study?

    2. Anonymous Coward
      Anonymous Coward

      Re: Update from UKFast

      "It's not uncommon to lose power in the UK"

      Okay, first of all.. I'm not the most experienced guy posting here that's for sure... I've only been enduring a career in IT for 13 years. But during those 13 years, working in a number of data centres, I've never actually come across the loss/failure of external power.

      It's different at home. Quite regular blips there... but I would like to think the quality of the connection to a data centre would be far superior than the wet string they use for over populated 1970's housing estates.

      As usual all I hear is excuses from UKFast. They don't have a clue!

      Maybe if they did depend on external contractors they would actually have a better power connection... I've always thought you should stick to what you are good at. If your good at running a DC, then do that, and contract in some experts for dealing with the power side.

      And why only a single power connection?

      "we run an N+1" - errrr, except for power feed....

      1. Anonymous Coward
        Anonymous Coward

        Re: Update from UKFast

        I was on call for a healthcare provider when the grid failed on 1st January 2011. The campus style site had a link to a 33kv substation and an 11kv substation. The 33kv distribution went down and the 11kv only supplied a corner of the site.

        The DC had a UPS and a genny (the site had 4 gennies in total) , UPS lasted 30 minutes less than the grid outage, the grid was down for a little over an hour. The genny had fluid level sensors in its bund - you know, to stop it blowing itself up if there was a leak. The bund was full of snow so the genny wouldn't start. An E&F manager bypassed the bund level switches but found the genny would start and idle perfectly until the revs were needed to actually make some power at which point it gasped and died.

        There was a tiny leak in the non return valve in the diesel filter which was sucking air in and interfering with the fuel pressure.

        4 problems in a row:

        Grid failure

        UPS batteries exhausted

        Generator safety cut out

        Generator fuel supply issues

        In the aftermath the UPS was upgraded to hold for at least 60 minutes and the gennies were reconfigured with a changeover mechanism to allow a second genny to be manually switched over to supply the DC.

        Shit happens. Good thing it was a Sunday and a bank holiday, there was plenty of time to stand it all up again afterwards. Nobody rehearses starting a cold datacentre - When your DHCP is down and your phones have rebooted (because the switches went down) but not picked up their ip addresses, tftp configs etc. it quickly affects absolutely everything. And what do you start first?

        On a side note, everybody that was needed came in and stayed until we stood it back up - but a combination of dire management that treated everyone as a consumable and a desperate arse covering blame culture means that NONE of them are working there anymore. But the experience they've gained is invaluable.

        AC for obvs!

  24. Anonymous Coward
    Anonymous Coward

    Is it ironic that in April UK fast won the government contract to support effective response to power outages?

    "British hosting firm UKFast has secured a contract to supply the UK Cabinet Office with a cloud platform for its emergency ResilienceDirect service, which supports effective response to incidents like natural disasters, terror attacks and power outages."

    https://www.ukfast.co.uk/press-releases/ukfast-secures-six-figure-government-cloud-deal.html

  25. Anonymous Coward
    Anonymous Coward

    Was it the Russians?

    Going by the hype about the Russaian threat to undersea cables at the recent RUSI meeting, the septics have been tapping optical fibre cables for years, was it a Russian pick-axe?

  26. Anonymous Coward
    Anonymous Coward

    Comms redundancy

    More seriously, Trafford park is also susceptible to the odd local scrote (or disgruntled business owner) torching BT man holes by throwing petrol down them or burning cabinets. Late 1990's and early 2000's since it last happened to the comm lines supplied by Salford Quays exchange.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019