back to article Generators and UPS fail in London datacentre outage

Tata's datacentre in the east end of London went titsup for two-hours on Thursday evening, following a power cut. Backup power systems also failed, downing servers belonging to hosting providers throughout three floors of the Stratford facility at about 5.20pm. Firms including C4L, ServerCity and Coreix were hit by the outage …

COMMENTS

This topic is closed for new posts.
  1. Chris Miller
    FAIL

    Testing?

    We've heard of it!

  2. N2

    Errant builders?

    Really? id never have guessed!

  3. Maliciously Crafted Packet
    FAIL

    Backup power systems

    This seems to be a recurring theme with power outages these days. Can anyone recall an occasion where the backup power systems have actually ever worked?

    1. Anonymous Coward
      Thumb Up

      Backup Power Systems

      My home UPS did during a local 30 minute outage. OTOH, I'm only powering my computer, monitor, cable modem, vonage wireless router and phones from it.

    2. Alan Esworthy
      IT Angle

      Backup power worked?

      It's going on 40 years I've been working in the IT industry and I've been directly involved one way or another with dozens of installations with power backups of many sorts, and seen them activated many times.

      They nearly always work as expected.

      But when they do, hardly anyone notices and the event is not considered newsworthy.

      [IT? What's that? Don't we call it Data Processing anymore? What happened to my plug boards? Ah - there they are - behind my card boxes.]

    3. Raddish
      FAIL

      Of course...

      You wouldn't hear about times when the backup power worked, because that's not newsworthy.

      "Power outage at tata datacentre! Backup power worked perfectly, no issues at all" isn't really that interesting as a story now, is it?

    4. Bassey

      No

      For a very good reason. "Everything's fine, folks" doesn't make the news.

    5. Alexander 3
      Thumb Up

      Actually, our UPSs are doing a grand job..

      I know what you're getting on to, but no, there are plenty of systems that have no problems at all because they have effective (and tested..) systems in place. However, a "Blackout Causes Failover System To Work As Expected!!" doesn't make quite as good a headlight as "Blackout Causes Failover to Fail Epically! Zombie incursions expected!!"

    6. Pandy06269
      FAIL

      Re: Backup power systems

      Oh yeah, we suffered a massive spike in our data-centre's main power supply at about 11am one morning - our business's busiest trading period.

      Of course our system worked - for all of the 9 seconds it took for the same spike to hit the secondary power supply and destroy the 10-year old UPS.

      Lucky we had a separate disaster-recovery site 50 miles away - oh wait, the replication link hadn't been working for a few days and no-one had noticed.

      IT companies never learn.

    7. Steve X
      Pint

      backup power

      Mine worked fine today. Power cut upset the clocks and the satellite receivers, but the UPS kicked in and kept my server and DSL running happily.

      Another reason to keep IT under control, and not outsource it to clouds. They evaporate when you leasst expect it.

      Cheers!

    8. The Cube

      Backup power systems

      They do work, the thing is you never see the press crowing about "data centre in north london suffers short mains disruption, UPS and generator systems function as intended, nothing happened" because that wouldn't be news.

      Now, if you don't build your data centre properly, don't maintain it properly (like checking that the UPS is actually charging the batteries) and don't run tests (such as black building) you are in for a nasty surprise.

      These things don't look after themselves and the two best ways to shoot yourself in the foot are;

      1) Let some bean counter skimp on the maintenance budget and hope it won't be customer affecting when it happens

      2) Build a banking grade redundant everything, more complex than the LHC monster that is too complex to operate or maintain and watch it all go wrong because nobody can run it properly.

      Obviously one of these is cheaper than the other...

    9. andy 55

      fasthosts

      Not a big Fasthosts fan myself, but when Gloucester got flooded a few years ago and lost electricity the datacentre backup systems did kick in and work whilst under water.

      1. Anonymous Coward
        Anonymous Coward

        Impressive

        Now if only the water processing plant had too we wouldn't have had to use the bowsers those few weeks! I guess that is as good a definition for 'irony' as any you'll find.

    10. Nexox Enigma

      I've seen one work!

      I used to work in a building with a tiny 'data center' (constructed with cubical walls...) and we had quite a few power outages, through which our ups and genny power managed to work each time.

      That's not to say that failed UPSs didn't cause us some extra downtime for no reason...

    11. Ciaran Flanagan
      Thumb Up

      redundant power

      @ Maliciously Crafted Packet

      When they work you never hear about them ....... its just not that interesting ...

    12. Steve Evans

      Testing sometimes backfires...

      I remember back in the 80s when I worked for GEC/Marconi, they decided to do a generator test one weekend.

      So they kicked them off, and they both started... Within a minute one stopped so they went to have a look at the problem.

      Whilst being ignored the second one felt lonely and decided to catch fire. Luckily there was an onsite fire brigade who came round in their little red landrover and tackled it.

      Unfortunately the fire, or extinguishing caused the mains electrical feed to trip, and the entire data centre was plunged into darkness.

      We had a great 5 hours of doing sod all in the office on Monday morning as the operators were still trying to get the mainframes to start up!

    13. Anonymous Coward
      Anonymous Coward

      re: backup power system

      Strangely enough, i can't recall one that made the news!

      Power outages happen all the time however, when everything is working, you don't know about it. It's only when something goes wrong that it makes the news.

      We have backup UPS, generators and an alternate site that can handle the processing load if necessary. Supposedly, you shouldn't notice it even if a bomb hit the site. Unfortunately, recently, it all went tit's up. The generators didn't kick in and processing didn't switch to the alternate site for some reason. The UPS is only there to keep everything going during the time it takes to fire up the generators, so that wouldn't keep it going for long.

      But, when its tested every month or so, you don't even know it's happening!

  4. Anonymous Coward
    Anonymous Coward

    tata ?

    no tata, thank you very much

  5. this

    Generators

    Probably had electric starters.

  6. Anonymous Coward
    Grenade

    Perhaps you would have gotten a response..

    TATA are the Indina conglomerate that own everything from Tetley Tea to Land Rover. Perhaps you would have gotten a response if they used an Indian call centre instead!

  7. Dustin 1
    Joke

    I wonder

    If they moved thier power control systems to thier new 27' iMac ;p

  8. JMB

    Backup power systems

    One problem is that people are often reluctant to test them fully in case they don't work!

    One way to do it properly is to pull the breaker on the incoming mains supplies.

  9. Beelzeebub
    Flame

    Cheerio

    I originally said tata to your English jobs, now it seems it's tata to your English datacentres as well, oh well, saved some cash didn't we?

  10. Peter2 Silver badge

    I suspect they didn't test their DR plan.

    Because i've never seen anybody that has truly tested their DR plan to the point that they weren't surprised when they had some downtime. I have seen plenty of people doing nice safe little tests to satisfy the management, but nobody that's done a real test.

    Which probably explains why i've seen things like a data centre go down because while everything did have power for the servers it didn't have for the air con. You'd be surprised how quickly a server room can reach 40 degrees without aircon.

  11. Colin Miller
    WTF?

    @Maliciously Crafted Packet

    If the backup-generator started correctly, then it would not be newsworthy.

    Shock horror. Company's disaster plan worked correctly!

  12. Anonymous Coward
    FAIL

    I know what happened

    Yes, as an Electrical Engineer for more years than you can shake a stick at the scenario is all too common as of late.

    Standby Geny - check, full fuel tank - check, battery ok? er wot?

    The company I work for maintains standby Genys for the local Police/ fire/Ambulance infrastructure.

    Genys are tested on load for two hours every six months, and the heavy duty batteries changed for new every year, if there has been no mains outage then the Geny will have run for about four hours in a year, but you change the batteries anyway cos comms are vital.

    Now a big name bank we look after does not want to pay for new batteries every year, "why there is hardly any wear on them and they cost over a hundred quid each!!, just leave the original ones for another year"

    repeat till failure

    Anonymous for obvious reasons.

    1. Anonymous Coward
      Anonymous Coward

      Merchant bankers?

      Would that be one of those banks that have paid 'bonuses' in the millions?

    2. Anonymous Coward
      Anonymous Coward

      I'm not sure you do.....

      At a certain place that I worked, the backup genny was tested weekly for startup, run and power output. Every month a full load cutover of the entire site to generator for a couple of hours was performed.

      When we actually had a power cut, do you think we could get the sodding thing to start? Still, watching the faces of those trying to make it work as the UPS battery deathclock ticked down was quite funny.

      BTW, any idea what happens some months later when, after some spending of the arse-covering budget, two gennies start up while connected to a power distribution system originally specified to support only one?

      I took this little lot as conclusive proof of Sod's Law.

  13. Pandy06269
    FAIL

    Cowboys

    I used to have a server hosted in this data-centre with ServerCity.

    Tata are total cowboys. The data-centre is poorly laid out, everything's just been shoved in wherever it fits, and you have to tread carefully otherwise you'll break your neck by tripping over the cables sprawled across the floor.

    The day after I visited the data-centre to do an OS upgrade on my server, I switched suppliers to Telecity - you could see the difference as soon as you walked through the door.

    No tata for me either.

  14. Anonymous Coward
    Unhappy

    Oh :(

    Joy, that's why I'm planning on doing my offsite backups to...

  15. Michael Fremlins

    @Maliciously Crafted Packet

    A very good question!

    I have seen successful backup power working at Level 3's building in Goswell Road several times. I have also seen a 7 hour outage there.

    Telehouse, seen a power outage there. Telehouse reps said it didn't happen, but couldn't explain why every piece of our equipment, in several racks, suddenly decided to turn itself off and on at the same time.

    Harbour Exchange Redbus (as it was then), seen a loss of power during "testing of the backup power system". Not successful, I would say. This was after another power outage when the backup systems didn't work. I recall several power outages at this site.

    Power outage at BT's Ilford POP. We were the first to notice and call in about it. Not sure if they have backup power, but I would be surprised if they did not.

    Basically at every site where we have equipment, where backup UPS and generators are supplied, I have seen outages. I think it's fair to say backup power works sometimes.

  16. Will Godfrey Silver badge

    Errant Builders -hmmm

    I wonder how much the BOFH paid them when the beancounters cut his 'mantenance' budget.

  17. Anonymous Coward
    Boffin

    UPS batteries dead and 3 failed generators

    .. sounds like grounds to ditch ANY backup contract with those people then!

    Also VoIP in the same data-center to cover your telecoms, inspired, truly inspired!

    Icon? For what was required not provided!

  18. Paul Hovnanian Silver badge

    No redundant datacenter?

    Tatas are always better in pairs.

  19. Mr Young
    Pint

    Battery not do it?

    I'm pretty sure a test/maintenance/replace schedule (whatever it's called) for their batteries would help? Anyway - a diesel generator should have sorted the problem just like that! FAIL

  20. frank ly
    Coat

    @Chris Miller and MCP

    On two occasions in the past, when working as a systems test engineer in large (nameless) companies, I have suggested cutting the main power feed (at a non-busy time and with advance warning to all staff) to test the backup power facilities and procedures, recovery process, etc.

    (Note: this was at a stage before a facility had become fully active and part of everyday company activity.)

    Each time, I was told that there was a danger of disruption to services and possible damage to equipment, so that would not be allowed by 'management'.

    By the time my brain had finished doing mental backflips to try to understand their point of view, the meeting had ended.

    Coat, because at the end of the day, I got paid whether it worked or not.

  21. Anonymous Coward
    FAIL

    lucky...

    Migrated out of there a couple of months back, lucky us, bad luck for anyone left there. It's a rubbish facility in a rubbish location that's rather expensive too...

  22. Anonymous Coward
    FAIL

    I wonder...

    I wonder if on the last test they saw the generators fuel was low and went oh let’s put in a fiver that will pass the test for now and we can fill it up later :)

    Does make you wonder about N+1 for redundancy it seems this case was -N-1

    Apparently Tata has the following

    "The facility offers complete redundancy in protected power, HVAC, fire suppression ..............

    The facility takes power feeds from multiple power grids, distributed via N+1 MGE UPS battery backup power and three 2.5 MVA Caterpillar diesel generators to backup primary power source with 48-hour on-site fuel storage supported by continuous refuelling contracts"

    Which is all well and good, but only if people know to use it, but I guess they thought they didnt to as 99.9999% of the time it would be done automatically but still if it didn’t, the people on site need know how to start it manually

    RE: Maliciously Crafted Packet

    Does it ever make the news if a D/C loses main power and the generator’s kick in and work?

    I would like to bet that more do have mains failure and stay online and are not in the news for working correctly when there is a power loss than those that lose power and fail to get the backup systems working.

    Tata Tata and your epic fail.

  23. Anonymous Coward
    Anonymous Coward

    Good question, I'll come to you....

    So a 'spokeswoman said Tata was still looking into what caused the outage and the subsequent failure of backup power'.

    More like they need some time to scour their Ts & Cs and cook up a lame excuse that attempts to absolve them of responsibility.

    Go to the back of the Data Centres for Dummies class, stay after school and write out 1000 times:

    "Data Centres must be continuously supplied with AC power, all redundant systems must be tested, tested and tested again"

  24. Robert E A Harvey
    FAIL

    testing

    When I was a telephone engineer we har Strowger exchanges run off 80V batteries. A big diesel engine could run the batteries when the mains failed.

    .

    We had 2. genies at Grantham. We tested one on Tuesday & the other one on Friday.

  25. Anonymous Coward
    Joke

    Reputation in tata's

    Odds are this company thought they didn't need a Power & Plant engineer.

    I've seen it all too often - These companies try to save "so much money" which works short term, but only ends up like this, with reputation in tata's....

  26. Anonymous Coward
    Happy

    Olympics question no. 110000

    At one time there was considerable concern that the electricity distribution infrastructure in that part of London would be inadequate for 2012.

    Related to that, a big new substation in preparation for the Olympics opened a few weeks ago.

    Could these two be related to Thursday's failure?

    http://www.london2012.com/press/media-releases/2009/10/london-2012-powers-ahead-as-first-olympic-park-building-is-complete.php

  27. Anonymous Coward
    Happy

    But they are cheaper

    And they can offer an even cheaper center in India. It won't stay up either ;)

  28. This post has been deleted by its author

  29. Kevin Reader
    FAIL

    The diesel had probably congealed in the tanks

    If IIRC something similarly embarrassing happened to a Reuters data centre in the 1980s when I worked for them. Builders = Power outage, then it turns out the generators had been fueled months (or years earlier). Apparently diesel goes off if you do that - turns into a kind of nauseous treacle, or an acid.

    I think the only other people who usually have to look out for this problem are the MOD with mothballed kit and farmers; so its little known. Some sources on the web say diesel will keep for 18-24 months without additives while others say 2 years max with additives. Apparently you have to keep it cool and avoid water condensing out in the tank to get those times. The data centre tanks are probably sited outside in the sun....

    If fairness the UPS batteries may have been empty because they'd kept things going long enough for the generators to (not) start. Those generators on the site are advertised as being 7.5MW in total so I guess the battery life is pretty short!

  30. Elmer Phud
    Thumb Up

    re: Backup power systems

    Sort of -- two fat generators caught fire after ten minutes ---oh how we laughed.

  31. Franklin
    Thumb Up

    Fortunately...

    ...not all of Tata's customers were affected. Spammers seemed to have uninterrupted service; got two spam emails this morning and one last night advertising Tata-hosted "make money fast" sites. Good to know that not all their customers had probems!

  32. Chris Hills
    Thumb Down

    MIsleading?

    CoreIX is still claiming on their website "The Coreix Premium Network has obtained 100% Network uptime over the last 4+ years." That does not tally up with their status page at http://status.coreix.net/ - which was pretty useless when they were down. If you have a status page you should make sure it is run on an entirely separate infrastructure and domain.

  33. Anonymous Coward
    Anonymous Coward

    Testing!

    Testing often doesn't work as its either not done on load and its done in a controlled manner. Large spikes can knock out the control systems and even then you could have tested it 5 minutes before and it could still not work for some reason.

    Much like backups the test is only relevant at that moment in time it does not guarantee the situation going forwards.

    Even with all that an emergency power off will defeat you and you won't be allowed back in until the firemen say so. Over the years I've seen outages caused by overload, component failure, huge spikes and a failed fan causing the fire alarms to go off.

    Nothing is perfect however much testing you do.

    In my experience its generally the control systems that fail and it requires an engineer to install a bypass and then a 2nd outage to take it out once everything is fixed!

    Still annoying though!

    Details received from Coreix

    At approximately 16:48 GMT the Stratford, London facility lost mains power from the power grid. The time-line of events is as follows:

    16:48 - Power to site lost - running on UPS.

    17:15 - UPS systems depleted, generators failed to start.

    17:30 - Generators failovers failed to function despite multiple attempts - The power engineers were dispatched.

    18:32 - The power engineers arrived on site.

    18:54 - The power engineer estimates 30-45 minutes to return power to site.

    19:16 - The power was returned to site and the process of booting up each rack commenced.

    19:35 - All racks powered up and brought on-line.

    The last twenty-four hours have seen numerous power grid blips but the system as designed have taken the load with the UPS battery backup and generators keeping the facility live, ensuring an un-interrupted service.

    Tonight however, at 16:48 a failure in the control boards prevented the generators from powering the facility once the power from the grid failed, a manual bypass was installed to get the facility on-line.

    The facility power engineers are continuing to monitor the facility and extra staff were drafted in to help.

  34. Jerren
    Joke

    Wouldn't a more proper title be...

    "Tata's Datacenter goes tits up?"

  35. Alain Moran

    BANKING GRADE?

    Pah .. I have an account with the halifax - well, soon to be I DID have an account with the halifax, since after their 'power outage due to the weather' the other weekend their UPS's failed to work, their generators never kicked in until someone phsically went to the site two hours after the power dropped, and they spent all day rebuilding their servers.

    Apparently Halifax dont believe in offsite failover, and from what I gather they dont have any UPS either ... for me HBOS have earned a new moniker

    Halifax, Bank of FAIL!

    HBOS, your summons will be in the post!

  36. Neal 5

    Oh yeah Tata

    I can't see the problem here.Everyones talking like they've had a major life experience/or not here.

    For those of you actually affected, thank you lucky stars you don't depend on TalkTalk.

    For those of you not affected, Sainsbury's are still selling salted peanuts at a very reasonable price, and if you don't like salt you can always wash it off.Wet peanuts solve both dehydration and nutrition issues, enabling those ana+++ retentive amongst you to have a fucking good dump in the AM.

  37. Anonymous Coward
    Anonymous Coward

    I have been working in the IT Field

    for nigh on 190 years.

    I remember it quite well, I said to Charles, 'Mr Babbage your machine is quite spectacular, but what about the power back up!? I know that young fancy of yours, Miss Ada, will steer you a merry dance, with her cog whirling and fancy smancy mathematicals, but dear God man, don't forget the power backup.',

    I said, and with much foresight and deliberation; 'It should be a power that cannot be interrupted, a supply of un interruptible power.'

    I swear the old codger, must have misheard me, what with his dickey ear and all that, and thought I had said, 'Hun'. He picked up his fire stocker, and chased me out the door. I was a spritely lad at the time, so I lead him a merry dance through the streets of London, Babbage was huffing and puffing behind, quite out of steam. 'Yes', I thought, 'those who don't learn from history are destined to repeat it.', well at least that is what old Eddy used to say, I did think he was a bit of a Burke though.

  38. Anonymous Coward
    FAIL

    Tata'd

    Having worked in a company that TATA came in and covered a major contract for because the phb's deemed we didn't have the expertise in house. I am not surprised in the least.

    They, like a awful lot of other huge indian outsourcing companies who are trying to branch out think they can just cut costs to the bone, throw in a load of inexperienced staffers to learn on the fly and baffle everyone in upper management with managementspeak.

    However, when it comes time to walk the walk, besides talk the talk the outcome is woefully inadequate, and the response tends to descend into a flurry of email chains slowly with each fwd or cc descending the inner caste structure they have internally until it falls on the desk of someone so junior they can't pass the buck any further along. And they will have to go and sort it themselves, whereby they won't have the experience being junior and won't have the political clout when a serious situation arrives to take the needed decisive action. They'll then send a email to someone who has the political standing in the structure to sort something, but who convieniently will be too busy (hiding behind the couch to avoid any flak) to deal (and take some responsibility). Until the client screams for blood and points to SLA's, whereby some shakeup will take place, and for a few weeks people will try to field the issues.

    Usually we found it easier to just quietly do the work ourselves while the tata people sent huge email chains around trying to generate as much noise as possible to cover the fact they weren't actually contributing anything.

    Host in their DC? I'd rather have a linux box on a bt dsl line!

This topic is closed for new posts.