back to article Data centre dangers: Killing a tree and exploding a UPS

It's the weekend, so it must be time for another instalment of “On-call”, in which we share readers' tales of the odd goings-on that take place when they're required to tackle a job outside of normal hours. This time around, readers Graeme shares five incidents in just one data centre. The first started when someone on the …

  1. Anonymous Coward
    Anonymous Coward

    “THERE'S WATER COMING THROUGH THE DATA CENTER ROOF!"

    Well just erect a tent inside the data centre to protect the kit!

    1. Anonymous Coward
      Anonymous Coward

      Though I think they needed a bigger shed: http://www.uis.cam.ac.uk/reports/central-network-hub-incident-30-january-2015

    2. Anonymous Coward
      Anonymous Coward

      They were lucky it wasn't the 100mm waste pipe from the toilets that came apart at a joint in the comms room's false ceiling. Happened to a colleague who was called out in the middle of the night for a nasty mess slowly dripping down the modem racks.

      A computer room was to receive a new computer. It was on the third floor of the building - and had been built with an external doorway with a hoist projection for handling large equipment. When the delivery arrived it was then discovered that someone had built a bicycle shed directly below the hoist.

      Another computer room was receiving its first 600mbyte hard disk***. The dolly was rolled down the room's central aisle - whose foundation was solid concrete. As soon as they started to move it to the side - the false floor's tile's corner supports started to collapse. Just in time they managed to get the wheels back on the concrete while a Plan "B" could be developed. It was reckoned that otherwise they would have had to take the roof off to get access for a crane.

      *** In 1970 the 600mbyte hard disk units were extremely heavy objects. The enormous physical size is now almost unimaginable - in my mind's eye about 2.5m high x 2.5m x 1.5m.

      1. Yet Another Anonymous coward Silver badge

        Heard of one like that in the mid-west USA.

        An unused elavator shaft down the middle of the building so they crane the computer in from the roof into the basement. Computer runs happily all summer and all winter.

        Come the spring and the 6 floors of snow that has fallen and built up in the elevator shaft melts ....

      2. Doctor Syntax Silver badge

        "In 1970 the 600mbyte hard disk units were extremely heavy objects."

        I remember being told by an ICL engineer that the spindle had to be aligned with the earth's axis, otherwise the bearings wouldn't withstand the precession.

        1. Anonymous Coward
          Anonymous Coward

          "[,..] otherwise the bearings wouldn't withstand the precession."

          Ours was aligned on the normal floor grid - not sure if there was any other consideration apart from aesthetics. It had head crashes regularly - with inch wide swathes of bare metal on platters that feel like they were at least 2 feet (60cm) in diameter. Eventually the unit was set up by the expert with the magic touch.

          Backing up my PC mail folder of nearly a gigabyte in a few minutes - I remember that the 600mbyte unit took 8 hours to back up to tape. That remarkably high speed was only achieved by a clever use of the disk controller's Command Chaining instructions that could be modified on the fly.

          The bearings were also water cooled. If the mains pressure dropped there would be a load groaning sound. The disk unit would start shutting down - then recover immediately as the mains pressure was momentarily restored due to the lack of demand. The oscillating noise was impressive and would go on for several minutes.

          1. Stoneshop

            Ancient hardware

            platters that feel like they were at least 2 feet (60cm) in diameter.

            At TNMOC I noticed a single disk platter standing on edge against a disk drive cabinet. It was slightly higher than the approximately 3 foot cab.

        2. Dave 32
          WTF?

          Magnetic Drums and Destroyers

          There is a story of one of the early magnetic drum storage units being installed on a naval destroyer. The rapidly rotating drum produced an extreme gyroscopic effect, so much so that when the captain ordered a course change, the ship refused to respond. I'm not sure if it's real or not, but it might be fun to research.

          Dave

          1. Anonymous Coward
            Anonymous Coward

            Re: Magnetic Drums and Destroyers

            That might be like the story about the large single platter disk that was mounted vertically. It was belt driven and apparently the belt had to be changed every time it was powered down. Bringing it up to its high rotation speed was carefully managed to prevent damage to the belt.

            The possibly apocryphal story is that one of these monsters running at full speed jumped out of its bearings. It scythed across the room and embedded itself in the wall.

      3. Cpt Blue Bear

        "They were lucky it wasn't the 100mm waste pipe from the toilets that came apart at a joint in the comms room's false ceiling. Happened to a colleague who was called out in the middle of the night for a nasty mess slowly dripping down the modem racks."

        Dear God! That brings back horrible memories.

        Shared office space with a suspended ceiling. Very convenient when it comes to running cabling, but not so much when "waste" pipe from the upstairs toilets decides split at a joint. To make matters more fun the bloke on earlies was anosmic* and didn't notice anything. Cue the mid morning rush and we had a small but noisome lake on the ceiling of the toilets on our floor. The IT cave shared a wall with said dunnies and we literally had to build a dam to prevent the lake spreading until the plumber arrived.

        A tip for anyone who finds themselves in a similar predicament: if you tip the outside edges of the suspended tiles up you only have to seal the gaps rather than build a complete dam.

        Now strictly speaking, this happened mid morning...

        * The second time I've discovered anosmia. The first involved working on a motorbike in a closed shed. "Do you smell petrol?" says I. "No, but I am a bit light headed" replies mate who's been in there for two hours with petrol running from the open fuel tap on the other side of the bike...

      4. Developer Dude

        My first job out of college my employer had a 300 MB drive for the in-house minis (HPs IIRC) that was about the size of a dishwasher, but much heavier. It had removable disc platters. You could lean against it and feel when someone was accessing data.

    3. Dave 32
      Pint

      Tarp

      It's always a good idea to keep a tarpaulin or two around sensitive electronic equipment.

      My experience goes the other way, though. Back in the days when computers weighed 30 tons, and were water cooled, one of ours popped one of the elbows in the water cooling supply. The people from below called up, wanting to know why water was coming through their ceiling. It seems that the large and VERY expen$ive computer had lost its water cooling supply, and it was feared that the CPU may have melted down. Fortunately, due to the panicked call from below, the machine was saved. There was still one h*ll of a mess to clean up in the raised floor area, though.

      Dave

  2. Richard Jones 1
    Unhappy

    Things that Go phut

    Generator with two tanks, a main storage tank and a smaller 'local' tank that was supplied by a send and return set up that pumped the fuel up from main tank to a local tank via a switch and valve system returning the excess to the main tank. Engine ran perfectly every test run and then the mains failed oops, The lift pump valves were set the wrong way, the 'local' tank did not fill and the pump overheated pumping nowhere, as the run went on longer than a test. Generator quit with no fuel and the lift pump now sick, well you can guess what happened!

    In another building the large bus bars were insulated on the left the right and bottom, where was the painter, yes up above when he put his paint pot down. Entire room spray painted with hot burning paint.

    In another location a big A/C plant lost power for a short time, so the motors started to spool down, power came back suddenly the reverse torque spun the stators, 'look no working pump motor' until they were completely rebuilt. It happened twice before a minimum duration break was built into the power system to allow the pumps to spool down completely.

  3. JimC

    Wasn't a callout

    But I once came in to find the 9 inch deep false floor in the server room 6 inches deep in water - and everything still running with all the mains sockets immersed...

    Unfortunately I can't find the photo of the Unisys (or might have been Sperry) mini that was delivered in a 19in rack and the tail lift on the delivery lorry went pearshaped and it toppled over and 3 feet down onto the concrete. Enough force to make for a rack full of rhomboidal equipment...

    1. Anonymous Coward
      Anonymous Coward

      Re: Wasn't a callout

      "Enough force to make for a rack full of rhomboidal equipment.."

      That happened to a customer's memory enhancement cabinet - which was the standard 6 feet high size. Must have been dropped somewhere in the airport freight depot. A slight mark on the cabinet where it had hit something. Opening the door showed a similar rhomboidal skewing of the contents.

    2. Anonymous Coward
      Anonymous Coward

      Re: Wasn't a callout

      "But I once came in to find the 9 inch deep false floor in the server room 6 inches deep in water "

      The computer cabinets in a Bristol company were observed to have a tide mark from one flood.

    3. Stoneshop
      Boffin

      Re: Wasn't a callout

      But I once came in to find the 9 inch deep false floor in the server room 6 inches deep in water - and everything still running with all the mains sockets immersed...

      On one occasion I was called to investigate network errors. This was thickwire ethernet with vampire taps, and those were commonly chucked under the raised floor. Fine, as long as noone touched them.

      In this case, it wasn't a person or animal who had been at the cables. They were about 10cm deep in coolant fluid, which, being glycol-based, had also turned the linoleum (which was the standard office floor until that room got converted) into a gooey, custard-like substance.

      delivered in a 19in rack and the tail lift on the delivery lorry went pearshaped and it toppled over and 3 feet down onto the concrete

      Place I worked, years later, had just had their system and storage cab drop-shipped. Van wasn't positioned just right against the loading dock, so the driver pulled forward with the intention of repositioning. He did however have the cargo straps taken off already, so in came Sir Isaac stating "objects at rest stay at rest until acted upon by an external force". This external force was found to be gravity, as soon as the floor of the van had cleared the still-stationary racks.

  4. Lee D Silver badge

    Not "on-call" as such but I work support for schools.

    Interesting events have included the main server room and IT suite dying - the complete power-down. And the UPS on the racks giving up almost instantly and cutting out immediately. Really weird. Got things back up and it still happened several times. Replaced the battery. Replaced the UPS. Still kept happening.

    Eventually traced it to a upstream mains circuit where someone was plugging in a heated serving table (the kind of thing you get in canteens). We thought it was just overload because it was rated at some silly wattage. Except it still kept happening even with nothing else on.

    The serving table had two plugs on it, one for the heated base, one for the heat-lamps in it's hood. And they plugged them into two separate plugs. Which, it turned out after I asked for it to be looked into, happened to be on entirely different phases. No wonder the UPS gave up! Had the electrician re-wire and that stopped the UPS blowing even if not the circuit getting overloaded. Am now waiting for the electrician to return to run a separate power feed to the IT that's unconnected to what the canteen want to do.

    Just glad I never did anything cocky like say "Oh, but the power's out so doing THIS should be safe..."

  5. Anonymous Coward
    Anonymous Coward

    Phase mixup

    We had one a while back where one of our customers was having work done on the mains feed - so they disconnected the mains feed overnight and manually patched in an emergency backup feed, running off the UPS in the meantime.

    About 10 minutes after they patched in the backup feed, someone noticed the air conditioning fans were spinning backwards. No worries, the sparkies just cracked open the air conditioning units and swapped two of the phases over. Rather than fix the root cause of the problem where they'd actually patched in two of the emergency feed phases backwards.

    Another 10 minutes passed, and the UPS (which had - up to this point - been silent) got pissed off with being fed two of its phases backwards, and just switched off without warning...!

    Cue pandemonium for 3 hours while they a) fixed the phase issue, b) fixed the UPS, and c) we dragged all the servers back up and got them to a healthy state just in time for the next day's business...!

    1. Anonymous Coward
      Anonymous Coward

      Re: Phase mixup

      Exchangeable disk units that were wired incorrectly would rotate backwards. The symmetrical heads flew ok and the unit would work on that mainframe. It was only if the disk was moved to another machine that t couldn't be read - and vice versa.

      Card punch peripherals were not so forgiving, Going backwards resulted in major damage to the innards.

      1. Dave 32
        Mushroom

        Re: Phase mixup

        Ever experienced a floating neutral? We have. We lost the neutral to a three-phase fed building once. It probably had something to do with all of the computer equipment that was being used in the building, and the tendency for such equipment to produce a high harmonic content on the power lines, and this tends to cause the neutral currents not to balance out, which results in a much higher than normal neutral current, which has been know to melt/fuse/blast neutral connections. The result is that the neutral connection inside the building starts floating, and all of the phase voltages go VERY wonky due to that floating neutral. I measured 117 Volt outlets producing anywhere between about 70 Volts and 180 Volts. Of course, we shut down all of the computers as rapidly as we could, and, quite surprisingly, we didn't lose any! We did lose a BUNCH of fluorescent lights, though. Yeah, that was, umm, fun.

        Dave

  6. Doctor Syntax Silver badge

    Thunderstorm over new year. Did no damage to anything except for the big UPS in the basement. All the servers ran without backup power for months until it was repaired.

  7. Anonymous Coward
    Anonymous Coward

    The details of this might be wrong because I was in a daze at the end of it:

    I used to be an engineer for a large satellite communications company, I was on nights and had been asleep for about two hours when the day engineer called me (mid-day?). "<insert country name> is off-line". Yes, pretty much the internet for half the country wasn't working and it was our problem. The duty engineer wasn't sure what to do, this was beyond his limits and he knew I could probably solve the problem.

    I rushed to work, getting there in record time, and started the diagnosis. After a while the duty engineer was just getting on my tits with his worrying and the 'service manager' had arrived so I suggested the duty engineer go home. I couldn't find out why the satellite modems wouldn't lock, and I spent all afternoon and evening stripping down every last part of the facility looking for the fault while the service manager fielded calls from angry customers, politicians and royalty.

    By midnight and after swapping out damn near everything it started working again. The service manager said he couldn't see what I had done but it was all working again. I had been working for 12h straight on adrenalin, pepsi and sausage rolls. The service manager said he would go and get me a kebab after pointing out that it was now my shift until 8am...

    Over a decade later I wish I knew what it was that I did that fixed it and I also wish I knew how I made it to 8am.... I really f*ing wish I knew....

  8. Anonymous Coward
    Anonymous Coward

    Diesel locomotive backup generators

    I consulted at a company once that had a big datacenter with a loading dock off the back door. Off that dock there was a short hallway with very thick steel doors on either end that led to huge concrete room that housed massive banks of forklift batteries powering their UPS, and two diesel locomotive engines serving as backup generators. The datacenter only needed one to run the DC + HVAC, but I was told it used to require both back in the mainframe/mini days.

    One weekend they were doing some regularly scheduled maintenance/testing (which I was not involved with) on the UPS/generator setup. One of the locomotive engines threw a piston or some other important part, which impacted the other locomotive engine and caused it spray debris that impacted the ceiling, wall, electrical and HVAC equipment along with a number of the giant batteries. Fortunately no one was in the room at the time it happened or they almost certainly would not have survived.

    It took them most of the weekend to clean things up and replace or bypass damaged electrical components to the point where they could re-power the datacenter, then start bringing servers back up. This last part was my only involvement, I didn't do 'on call' but they needed all hands to help to get things back up by Monday morning and I was happy to lend a hand and make some easy money.

    Their only backup at that point was the (remaining) batteries which would only cover a very minor interruption. It took several weeks but they were able to find someone to rebuild the first locomotive engine that started the whole mess and get it operational again. The second was declared a total loss. Fortunately they did not have any power outages during the time they were without backup generation.

    I wish I'd gotten a copy of the pictures from the guys who were involved, they were both scary and impressive.

  9. Anonymous Coward
    Anonymous Coward

    Many a slip

    A customer had order the smallest of a new range of mainframes. After many development delays the bottom of the product range kept getting pruned - so the customer's order was repeatedly upgraded to the next largest one.

    Eventually it was decided to fulfil the order by importing a compatible machine from the USA. The engineers installed it over a weekend and ran the commissioning tests ok.

    The customer was delighted with the bargain they were receiving - and of course that the long delay was finally over. All that was left to do was for the customer's electricians to take the necessary jury-rigged 110/240 volt transformer and wire it into a permanent circuit.

    The engineers returned in the morning to be told that the customer's electricians had connected the transformer backwards.

    1. DropBear

      Re: Many a slip

      At the first cursory read (hey, it's past midnight - and a couple of glasses - around here) I started wondering how the hell does one connect an AC transformer "backwards" - then it dawned on me they got a load of 480V instead of the 110V they were meant to take... ugh...

  10. Anonymous Coward
    Anonymous Coward

    Humidity

    The mainframe operators came in one Monday morning to find everything covered in a thin film of moisture. The A/C had been overdoing the humidity control.

    Regardless the operators still pressed the power-on button - and the engineers spent several days with hair dryers drying off all the boards in the machine.

    1. JimC

      Re: Humidity/hair dryers

      That reminds me: in the flooded floor void incident I posted earlier we were left with a very damp void once the water had been drained out. But one of the Ops was into dogs, and specifically into showing poodles. You know, artistically cropped fluffy dogs. She had one *serious* hair dryer which was a big help in drying out the void.

  11. Anonymous Coward
    Anonymous Coward

    Week-end nice outage

    This happened in a DC, France, which was hosting all this international company's core systems. This was the WE, sunday around lunch time if I recall.

    The company had invested big bucks to make this internal DC resilent to power outages, following a big cut. But, unfortunately, the whole UPS and powergen installation was followed-up by the general facilities dept, made of deadwood exlusively, and not by IT.

    As the powergen was never fully tested, for sure, the first time it was required was gonna be messy. It sure was ! As there was a leak in the admission or exhaust pipe (can't remember which), the whole thing was spitting fire up to around 2 meters above the end of the exhaust pipe ! Nice show, nicely appreciated by the few who attended.

    Of course, since the powergen was not running OK, it failed to sync with the UPSes for power feed, never loaded them, and had the whole DC go down.

    Ensued a number of clumsy reboots with nearly empty UPSes and some severe data reconstruction the week after.

    This nicely kicked-off the project a re-shuffling of all cluster systems between this DC and another one, one month after.

  12. Stoneshop
    Holmes

    UPS problems

    “But on one occasion just at the end of the working day there was an outage and the (singular) static switch designed to throw load from mains to generator went bang. Cue a lot of running around shutting as much down as gracefully as possible before the UPS emptied, which wasn't very long!”

    Something similar happened at a site I was the resident engineer for. As the site had grown from just a site to a major network hub, it was decided that it should have a power backup system. Which, in itself, was a sound idea. Slightly less sound was the decision to test the finished setup by simply throwing the Very Big Switch so that the UPS would see a power failure and kick in. Which went allright initially, the power was now coming from a large bank of batteries through an inverter, and after half a minute or so the diesel started. Which then had to sync to the inverter output (this was mid-1980's, power electronics weren't as sophisticated as they are now, and apparentlly it was supposedly easier to sync the generator to the inverter instead of the other way around). But the syncing didn't happen, the diesel kept hunting for the correct revs, and soon the batteries, beefy as they were, were depleted. With predictable consequences.

    Another site was also having a no-break installed. The guys from Perkins had just finished bolting the diesel down, and wanted to check it out So it was fired up, poked, prodded and listened to, and considered OK. The only item that was then found to be not OK was the shutdown button, which was simply not yet hooked up at that moment. A bit away from the "Emergency diesel stop" button was another one. Red. Which the Perkins technician pressed. This one was hooked up, and yes, it was THAT red button.

    There's also the matter that one's no-break should not only be powerful enough to keep one's machines running, including their cooling, but also the cooling for the UPS itself.

    1. Anonymous Coward
      Anonymous Coward

      Re: UPS problems

      "But the syncing didn't happen, the diesel kept hunting for the correct revs, "

      On a university interview day the electronics department were doing practical demonstrations. One showed how you watch the lights on a small AC generator to indicate sync with the mains - and then throw the switch to connect it.

      The demonstrator had a possibly apocryphal tale that one day someone threw the switch at the wrong moment. The generator then went head-to-head with the capacity of the UK grid system - came to an abrupt stop - and disintegrated.

      1. IanDs

        Re: UPS problems

        Similar incident in the heavy electrical engineering department when I was at uni -- two massive ex-submarine motors acting as a motor-generator set where the 3-phase mains should only have to supply the losses, used to evaluate electric motor efficiency under varying conditions.

        Question: "I wonder what happens if I reverse the field coil polarity?"

        Answer: Loud bang, power drawn from mains jumps from kilowatts to megawatts, whole of East Cambridge shuts down as the local substation trips.

        Not apocryphal, I can still remember the Scottish lab supervisor running in screaming "Which one of you stupid c*nts messed with the field coils?"

        1. Anonymous Coward
          Anonymous Coward

          Re: UPS problems

          I was involved in a similar "I wonder what that button does" incidents when I was at college. I was doing a BTEC in engineering and as part of the electrical side of the course we had an assignment to draw a schematic of everything in the main college electrical plant room. So off me and 4 or 5 mates went to said plant room armed with A4 binders and pens. The room was pretty small and crammed full of breakers and other random bits of kit. Anyway on one wall was a chuffing big breaker that had a handle that looked like a one-armed bandit, now I knew that this was the main college breaker, but on the side of the box was a little tiny button, now I wonder what that does? Why oh why I did it I don't know but I pressed it and CLUNK breaker was thrown and everything was plunged in to darkness1 It was the fecking breakers test button! The whole college was out, lecturers coming out in to corridors wondering what was going on etc. Took about 10mins to get everything back on again (this was late 80's) My mate's backed me up and said my A4 binder had knocked the test button. For the rest of that term the colleges bell, linked to the main college clock ran 10mins late!

        2. Dave 32

          Re: UPS problems

          I, too, got to synchronize an AC alternator with the mains while I was in engineering school. And, yes, there were stories of where idiots had gotten the connection/phasing/synchronization wrong and wrecked the alternator. But, that wasn't the worst incident!

          I was in the last engineering lab that got to see the power distribution equipment for the engineering building. You see, the power was brought into the engineering building as a 14.4 KV three-phase feed. It entered in conduits, went through a three-phase, oil-filled breaker/switch, then to the three-phase transformer, where it was stepped down to 117/234 Volts for distribution throughout the building.

          That three-phase breaker was a real piece of work. It was normally kept locked. And, to use it, after removing the lock, one operated the handle three times, which cocked a spring. At the end of the third stroke of the operating lever, it released the spring, which rapidly drove the contacts apart. With 14.4 KV (RMS) on them, and under a fair load, they would, of course, arc. But, the idea was that the contacts would fly apart so rapidly that the oil would quench the air, in conjunction with the magnetic blowout device (where a magnetic field pushes the arc sideways, to increase the length, to help extinguish it).

          Well, some idiot had left the breaker handle unlocked. And, the latch inside the breaker had failed. And, some idiot had precocked it with one stroke. That created a disaster just waiting to happen.

          So, as the international graduate student, who was teaching the lab session, started waiving his arms around as he explained the workings of all of the equipment, his elbow bumped the breaker handle. With the failure of the latch, and the partial cocking of the device, it caused the contacts to open by approximately one inch. Now, a one inch gap, at 14.4 KV under full load, will not interrupt the circuit. All it will do is create one h*ll of an intense arc, one that the oil will not quench, but which will merely boil (and carbonize) the oil, causing it to eventually pop the over-pressure relief valve, which causes most of the room to be sprayed with hot oil. Even worse, the magnetic blowout will not cause the arc to be extinguished, but will merely push it sideways, such that the arc between the circuit contacts turns into an arc between the incoming 14.4 KV line and the grounded frame. Ohoh!

          This arc will consume megaWatts (or more!) of power, which will not only overload the distribution system, but will create a very substantial phase imbalance in the distribution network. The result of this is that the upstream breakers will trip at the substation feeding the entire university campus. That wouldn't necessarily have been so bad, except that this was on a Wednesday morning, in the dead of winter, when the entire electrical system was under a maximum load condition. The sudden removal of such a significant load, in such a sudden and unbalanced manner, caused the transmission line to go off-line. The sudden removal of the transmission line load on the generating station caused it to go out of synchronization with the rest of the region wide network, causing the entire generating station to go off-line. The cascade was impressive, resulting in a substantial portion of the state being blacked out.

          Even more impressive was the stack of paperwork and engineering reports that swirled around for weeks afterwards. Fortunately, no one was killed in the incident, although that was probably more due to pure dumb luck than anything (I am told that quite a few pairs of underwear were ruined!).

          And, yes, I can provide details. It happened in the late winter (February) of 1982.

          Dave

          1. Anonymous Coward
            Anonymous Coward

            Re: UPS problems

            shit the bed that made my pressed the test button story look tame!

  13. Anonymous Coward
    Anonymous Coward

    I used to have a job taking care of a distribution warehouse system running on a 100A board - over rated to 300A.

    The feed cable used to regularly burn the paint off of the wall.

  14. Little Mouse

    Is there a name for this?

    Not a specific incident as such, but when the time geys to Sod O'clock, you're carrying out a 'significant' repair/upgrade, you're the only person in the building, you simply have to have everything up and running by six and fucking up simply doesn't bear thinking about - does anyone else change from a confident touch-typing machine to a two-finger hunt-and-peck scaredy-cat, looking up from keyboard to screen after every sodding keypress?

    1. Alan W. Rateliff, II
      Paris Hilton

      Re: Is there a name for this?

      In this situation I often find myself taking a ten-minute break: going back over terminal scroll-back, re-re-re-reading documentation, just walking away thinking over every step I just executed. Then I come back, take a deep breath, and exhale slowly as I hit ENTER or click [FINISH].

      I /always/ have backups on-hand in one form or three, just in case. But the last thing I ever want to do is restore backups and have to tackle the project again. Fortunately I have only a few times had to complete a project up against a hard deadline, and just as fortunately I have only had to restore backups a couple of times and come back later.

      On the big stuff -- read as customer projects -- anyway. Plenty of times I have totally screwed up my own installations and completely bricked machines (what is the equivalent of "bricking" a virtual machine?) But those are the fun times, when I can laugh while it happens versus shit myself skinny.

  15. G.Y.

    unloading

    We had a PDP11/55 delivered to the university, very late Friday night. No workmen to be seen. We finally strung some ropes from 2nd floor, put a wooden table in the computer's swing arc, told the truck driver to start driving; the computer box made it safely down; the table was matchwood.

  16. Anonymous Coward
    Anonymous Coward

    We've had a few incidents in our old server room (we have a nice new DC now) Its a 1970's building and the server room used to hold a PDP. Came in to open up one morning opened the door to be greeted by temperatures that I had only experienced once before in life and that was leaving Singapore's airport terminal to go outside! It was BOILING the walls were hot the doors were hot the racks were hot, aircon had failed over night. Went in to the room very strange smell. The aircon had just been serviced and the engineer claiming it wasn't anything he'd done. Anyway traced it to a APC UPS the front of which was red hot!

    Not had an incident in the new DC yet but before it was a DC it was a staff open access computer room (actually originally it was the terminal room for the PDP housed in the room next door) Anyway the room above is a wet lab (its in a laboratory) it has various sinks and a Millipore water still in it which has leaked on numerous occasions. Its only going to be a matter of time!

    LOL I kid you not , just had lab services come in to say they were called out at the weekend due to a leak in one of the sinks in the lab above the DC! Lucky no water made it in!

    1. Alan W. Rateliff, II
      Paris Hilton

      One fine morning around 3am I rebooted one of my servers in co-lo. After 10 minutes it did not come back on-line, so I hit the PDU to power-cycle it. No alarm bells rang at this point since this particular machine was very, very cranky due to a bad temperature sensor on the motherboard. Most times killing the power for 20 minutes would bring it back online.

      By 4am the server was not up and I got a text message that another server had shut down while yet another was starting to shut down. Of course this had me very worried and I hopped in the car to head to the co-lo.

      The co-lo is through two fob-key doors, up a split flight of stairs in a back stair well. As I approached the door I felt a lot of heat. This brought back memories of all the fire safety videos in school and interstitial PSAs from Saturday morning cartoons: a hot door means fire! I sampled the air a few times and did not smell any smoke, so like any idiot would do I presented my fob to the sensor and opened the door.

      I thought I was going to pass out from the heat. I had not felt heat like that since a summer many years ago on a trip through Texas when my flip-flops melted to the asphalt parking lot of a truck stop. I got inside and found the air conditioning was not working and neither was the air handler. The thermometer on the wall at the door read around 160F. Servers were beeping, some had shut down already, and I had to get the heat out of there. But it is a natural heat-trap: windows covered over with foam insulation board, an inside door that I cannot open, and an outside door which just vents into a stair well with no exhaust ventilation.

      I started calling the co-lo operators and leaving nasty messages. I found two exhaust fans which had been closed up when the new dedicated A/C was (recently) installed and cut them open, only to find them stuffed with insulation and covered on the far end (this was the first time the A/C had been given a good running due to hot weather.) I had pulled down one of the foam insulation boards and was just about to chuck a stool out the window to ventilate the room when someone showed up. Once we got the temperature in the room down enough the blower would turn on and the compressor would run someone had to run a hose on the compressor to keep it from tripping until the A/C guy could turn up.

      Obviously the A/C had failed. We figured out by one of the servers graphs that sometime around 8pm or so the A/C compressor failed and the temperature steadily rose from 78F to around 135F (inside the server) which held for a couple of hours until around 1:30am when the blower motor went into thermal shutdown and the temperature in the server rocketed to about 180F before the graphs stopped. Tying into some of the other stories here, as I recall a contributing factor to the failure of the compressor was a reversed phase. I was too angry about the whole event to stick around and get the whole story.

      MRTG has a neat feature to trigger a script upon specified variables hitting specified values. I now have my UPS graphs trigger alarms in my office and text messages when temperature (amongst other variables) hit the danger zone. It is a co-lo, after all, and not a data center, is what I tell myself, and that I should have been watching that crap from the get-go.

    2. Alan Edwards

      A/C Fails

      I had a couple of -- "interesting" -- AC failures at our old office.

      Some bright spark decided that the ideal place to put the wall-mounted AC in the server cupboard was above the Meridian phone system's main box. The heat exchanger in the AC froze solid, which killed the AC. The heat from the servers rapidly melted the ice, which dripped water into the Meridian and killed that too.

      Shortly after that we got a proper lock on the server cupboard door, so none of us office plebs could get in any more. The first we knew of the next AC failure was when the partition wall that separated the server room from the office kitchen was hot to the touch.

  17. Stuart Castle Silver badge

    Love these threads..

    As for my experience, it is a little limited compared to some commenters, but here goes.

    When I was a student, our Unit had a nice, shiny new(ish) building right on the banks of the Thames. The main computer room, along with the inlet for the electrical supply for the building was in the basement. Which, being next to a river, flooded regularly.

    We also had a lab in the basement of another building on that campus. That wasn't on the banks of the Thames, but still flooded whenever it rained. Probably not a good idea to put Tower PCs in that lab, and we spent many hours cleaning motherboards and repairing PCs that had been in a flood.

    My best one though was I was called in by a friend and ex-colleague to advise his directors on video streaming. The only time available for me to go see what hardware they had was a Saturday, so we went in to his work on a hot Saturday. Unfortunately, their server room was two 8 foot racks packed with servers, and a few assorted PCs in a tiny room with no windows and a small Air Con unit that failed. I have no idea how hot it was, as the thermometer in the room maxed out at 40 deg. C, and the mercury had gone past that to the top of the tube. What I do know is that the bulk of their hardware had overheated and shut down, and the couple of servers that were running had overheat warnings on their LCD displays. I also know that we were sweating like crazy almost the instant we went in there, and the only option available was get every fan in the building, take it to the server room, and try and get some cooler air in there. After two hours, the room was merely uncomfortably hot (as was the rest of the building), but it was cool enough we were able to get the servers restarted. On the plus side, that two hours did give me a chance to go into the local O'Neills with my friend and his then girlfriend, have a Full English and essentially get paid for it.

    On the plus side for him, he suddenly got all the funding he needed to install the monitoring and alert system he'd requested a while before and was denied.

    Where I am now, I'm not too involved in the server side of things (although I do get involved if needed), and the few times we'd had electrical problems, the UPSs have enabled the servers to shut down gracefully. Having said that, one time it happened, National Power had been drilling in the road outside and put a drill through our mains. That was fun for me. I was in the lift at the time, with four women. Something which isn't as fun as it sounds as one was slightly claustrophobic and I spent my whole time trying to stop her mild anxiety turning into a full blown panic attack. There was also another time when another colleague was working in our main server room, and one of the phases blew.. Apparently, he has never heard such a loud explosion. After his hearing returned, he (and his team) spent the next 3 days determining what machines we needed, testing them and plugging them into the other circuits.

  18. Roger Kynaston
    Pint

    a few years ago

    Not OOO but still a heart stopper at the time. Some workmen were doing things in the computer room. They were hammering away when they hammered so hard they triggered the emergency cut off switch.

    Most of it went down with a bang but a few things had standalone UPS so we spent a mad few minutes desperately shutting those down.

    As I remember nothing got seriously borked.

    Beer because lots of it is needed after such events.

    1. Anonymous Coward
      Anonymous Coward

      Re: a few years ago

      "Some workmen were doing things in the computer room. "

      One day workmen were drilling in the ceiling of the computer room - standing on a stepladder by the exchangeable disk drives. Bits of brick and dust had covered the tops of the units - on which the system was up and running.

      The engineer addressed the very large workman diplomatically. "Do you realise that a fragment of cigarette ash can cause a head crash". The workman looked down - grunted "Is that so" - and resumed drilling. Strangely the disk drives were none the worse for the experience.

      In the same room there was a cable acting as a comms link to an upstairs Remote Job Entry terminal. It was just two lengths of flat twin cable. One day the link was found to have failed. On investigation it was noticed that the cable when crossing the floor disappeared into the crack between two false floor tiles - then reappeared at the other edge. Someone had lifted the tile - and probably had to jump on it to make it lie flat afterwards.

      The false floor was unusual in that it was a steel lattice - and the floor tiles had a steel backing plate. The cable trapped between a lattice bar edge and the tile's backing plate was effectively in a blunt guillotine. Only one cable had been severed - thus confusing the diagnostic as the only lights were on the remote terminal - and they seemed to indicate everything was being polled correctly.

      One week there was a spate of mysterious system crashes. In the weekend engineering maintenance slot they were diagnosed as being characteristic of "dirty earth" noise upsetting the cpu. With the cpu stopped you could see lights randomly being set. The culprit was quickly identified as the bank of several tape decks - which the engineers were working on. Open and close the tape door - and the cpu fell over. ----- Then in the middle of demonstrating it to the engineers the problem disappeared.

      The question to the other engineers "What did you just do?" elicited the answer "Lifted that floor tile".

      The unusually large number of tape decks meant that there was a bundle of thick cables under the floor. These had metal boxes part-way along them - which were grounded to the "clean" earth. One of the boxes sat on top of the cables - and touched the bottom of the aforesaid floor tile. The bottom of the floor tile was a sheet of unpainted steel. The false floor was supported on a matrix of unpainted steel bars - which for safety was grounded by the building's earth. A nice "clean" to "dirty" earth bridge - fixed by moving the tape deck cable box off its pile of cables.

  19. launcap Silver badge

    Many years ago..

    I worked at a big company with several mainframes and a nice generator house very close by. They had regular generator tests and all was well.

    Until one day when the mains power really did go down. The UPS's did their job, then the generators took over. For about 10 minutes. Then died. Cue much screaming and cursing as all the IBM mainframes went down hard..

    Subsequent investigation revealed that the main fuel tank (under a car park) had had a leak for quite a while. There hadn't been a policy of testing the level of fuel (the fuel gauge measured how much fuel had been put into the tank, not how much was actually there) so no-one had actually checked.

    Opps..

    After that, there was a swift change in procedure. And a large fine paid for environmental cleanup as all the soil under the fuel tank was now absolutely soaked with diesel fuel..

    1. regadpellagru

      Re: Many years ago..

      "Subsequent investigation revealed that the main fuel tank (under a car park) had had a leak for quite a while. There hadn't been a policy of testing the level of fuel (the fuel gauge measured how much fuel had been put into the tank, not how much was actually there) so no-one had actually checked.

      Opps.."

      One version of this one, I have heard severall times is this:

      "refuelling fuel tanks is under facilities dept, people didn't budget it since it's rarely done, and no-one has asked for it, fuel is almost exhausted when the power cut happens". Same results.

    2. Cpt Blue Bear

      Re: Many years ago..

      A story I'm probably safe to tell now as the resulting insurance claim has been settled.

      As some of you may be aware, a couple of years back there was a bit of a problem with rainwater run off in Brisbane. Basically the CBD was a couple of feet underneath it. A major Australian company who will remain nameless has a largish operation there. With the basement flooded an impromptu meeting at Head Office was told:

      The local data centre is safe - its located on the 8th floor

      The battery backups will definitely keep the whole shebang running long enough for the diesel generators to come on line if the power is cut.

      These generators are on the roof, not in the basement which is flooded.

      The fuel tanks on the roof have some number of hours fuel before they need to be refilled. This will almost certainly not be enough to get through an outage.

      Where are the reserve tanks? In the flooded basement. Oops. Don't worry, we'll get one of the local IT bods to nip in and haul a jerry can or two up to the roof and top the tanks off. How exactly this was going to happen with the CBD effectively shut down was never explained.

      As it was the question was moot because the the generators did not power the aircon and the whole thing went into thermal shutdown within 20 minutes of the mains being cut off...

  20. kain preacher

    I think Dave 32 is the winner. Manged to bork parts of a state. Good job mate :)

  21. Luiz Abdala

    Nice lab after-hours incident...

    I don't work in IT, but several PCs were harmed during this incident. Here we go...

    I was visiting my sister in her experimental lab. A true lab, with mice, and electrodes, and of course, PCs. Well, they asked the local sparky to "upgrade" the lab to handle more things: another fridge for the experiments, more PCs, the usual lot.

    So we were there, 9 pm on a friday, when she is done and we are leaving. As she turns off the lights, all hell breaks loose. EVERYTHING hooked to mains claps out, fries, or quits, except the few brand-new PCs with multi-voltage PSUs (multivoltage psu was new back then). And all the wall sockets sparked at the same time. I found that odd, and asked for the multimeter left in the bench (some lazy sparky).

    All the mains that were supposed to be 110V, were now 220V. But ONLY with the lights TURNED OFF. As you turned on the lights, the sockets would jump back to 110V!

    The amazing sparky had the lights mixed with the wall sockets in such misterious ways: he managed to make the wall sockets change voltage as the lamps were turned on and off!

    So, we just shut down the mains for that whole mess, and left a big red warning note that all the circuits were on the wrong voltage, while my sister left an e-mail for her teacher-supervisor saying that all the gear running at 110V was deep-fried. Luckily they had a spare fridge in another circuit, and she saved most of the experiments.

    1. Cpt Blue Bear

      Re: Nice lab after-hours incident...

      Many years ago a client refitted one of their workshops. This involved stripping out the false ceiling, the aircon ducting above it along with thirty years of cobbled together mains wiring.

      Watch yon sparky back a truck into the shop.

      Watch as he climbs onto the tray.

      Watch as he reaches up to cut old wiring.

      Watch as he gets a boot big enough to knock him off his feet.

      Turns out that the isolation switch on the distribution board was wrongly installed and only isolating one phase. Oh how we laughed. Except of course, for the poor bastard in hospital who was probably only still alive because of the rubber tyres on the truck and the fact that tray had sides.

      This incident taught me to sight the licenses of tradesmen and make at least a cursory check that they understand what they are working on.

  22. BernardL

    I'm really sorry about this, and with all possible respect to sparks and cleaners (both of my brothers are sparks)... But I never let either of them into my computer room.

    I did the weekly cleaning myself. Any time we had to have workers in the room, I stayed with them. And we shut everything down, sometimes even including the PABX.

    I was courteous, but firm. Apart from a lunchtime replacement of a failing RA82 disk drive, we never had any unscheduled downtime in the 9 years I was there.

    1. Anonymous Coward
      Anonymous Coward

      The Deuce computer had mercury delay lines which looked like large mushrooms. They had to be kept at a constant temperature - so had a short mains cable plugged into a 13amp socket embedded into a false floor tile. It was discovered that the cleaners were unplugging them in order to use the socket for their vacuum cleaner.

      Later on some sites sites solved the vacuum cleaner electrical noise problem by having "clean" earth 13amp sockets with the ground/earth pin slightly rotated. Only a matching plug would fit - as visiting engineers discovered when wanting to use their test equipment.

  23. Developer Dude

    Back in the late 80s I worked in the QA staff for a well known maker of voicemail software and hardware.

    Back in those days the largest hard drive you could get was half a gig and voicemail was hungry for disk space. We had the in-house corporate machine which had two large hard drives for a whopping total of one GB!

    This was also in the days when you kept a paper record of the HD defect mappings so you could setup the HDs again if you had to reformat them.

    One night when I was working late (I was over worked and underpaid) in the QA lab, I get a panicked call from the VP of the corp that the in-house voicemail system was down and customers were noticing. Not good.

    So I walk over to the "server room" (a large closet) and look at the voicemail server. I opened up the front and noticed that the HDs seemed a tad hot. I also noticed that someone had conveniently stuffed the many pages of paper containing the HD defect mappings between the HDs, preventing them from getting any cooling air. Naturally the drives failed.

    I worked into the wee hours of the morning to recover as much as I could, but a lot of people lost a lot of voicemails. The next day I wrote a long email regarding why such practices were foolish in the least.

  24. Anonymous Coward
    Anonymous Coward

    Makes me glad I'm on the software side of things, anyway...

    The first place I worked was growing rapidly, it was the beginnings of the .com boom, and they needed to expand from a server-in-a-broom-cupboard (TM) to proper server room with a real sys admin looking after things. Unfortunately the powers that be went to muppets-are-us to find the new sys admin which, as I'm sure you can imagine, turned out to be a very bad idea. Within hours of his arrival he'd reconfigured the printers so well that not one in the entire building was working, this should have been a warning sign.

    Well, things bumbled along for a few months and the new sys admin with a big budget was spending like there was no tomorrow on shiny new hardware. There were servers, ups'es, AC, backup generators, the works. Not long after it had all been installed it got put to the test for real as some overly keen workmen managed to put a digger bucket through the main power line to the whole business estate.

    Of course the power went off to all our machines but we sat there safe in the knowledge that the servers would be ticking over on the UPS and we were just waiting for the roar of the new generator to start. After a couple of minutes of eerie silence a few of us developers went to the server room to find out what was happening.

    It turned out our rocket scientist of a sys admin had fitted the UPS'es and manage to charge them, he'd even managed to plug some machines into the UPS'es. The problem was he'd only plugged in machines that were completely useless to the running of the company. I forget exactly which but I seem to remember it was a selection of test servers that were wiped daily. The incoming data connection was fine but the router was off so we couldn't communicate with the outside world. That didn't really matter though because all the important machines had gone down hard and for some reason wouldn't come back up. It turned out the UPS had never been connected to the generator but that was also beside the point as the generator had no fuel anyway.

    By the time we'd figured out what had gone wrong the sys admin had walked off and there wasn't enough power left in the UPS to do anything to save the situation. One of the developers had the good sense to switch off the test servers and we boiled the kettle with the remaining bit of power and made a cup of tea to wait for the lights to come back on.

  25. Anonymous Coward
    Anonymous Coward

    Anon for what should be obvious reasons ...

    We have a fairly nice in-house "server room" - which I upgraded from the previous "how much cable can you get in one pile" mess we used to have. At the time it ran at around 7kW, and we had a UPS.

    Well we have a pretty reliable mains supply here - in some ways too reliable as it makes "management" rather complacent, and "repairs and renewals" hard to get funded. Anyway, I knew that our UPS would run us for a matter of seconds because the batteries were knackered - and management had been made well ware of that some years earlier ! We had a power cut, the lights went out, and the server room went quiet. At least the phones didn't ring as they were off as well :) When the lights came on, the server room didn't - the UPS had given up completely so we went onto manual bypass (I did have the foresight to include a separate manual switch on the wall !)

    Eventually I persuaded the boss to buy us a second hand UPS - but he wouldn't buy new batteries so we got what it came with, definitely far from new but with a little life in them. I'd tested it thoroughly, wired it in, tested it as well as it was possible before turning throwing the manual bypass switch to make it live. Then there came some "management discussion" which came down to "no you can't, we need to alert customers to potential outages, do it out of hours, blah, blah".

    Then, an unusual snow storm arrived - many in the NW of England will remember it from March 2013 when it dumped a huge amount of snow on us (along with strong winds). Well the lights flickered - I looked at my manager, and he looked at me, we looked at the lights, and it was obvious we were both thinking the same thing. Then the lights flickered again, we looked up, thought the same. Still management wouldn't let me throw the switch.

    Then the lights went out and I got to throw the switch *after* the server room had gone quiet.

    It turned out that the neutral wire on the 132kV lines to the local power station had broken, the wind had blown it across the phase lines, and eventually it had tripped out the circuit. The supply to our town is teed off that circuit, so we were off for a while while the DNO did some re-routing.

    And then there's a pearler from a customer site. Small rack, small number of servers, rackmount UPS in the bottom of the rack. Customer calls up one day, gives the helldesk a right ear bashing along the lines of "the UPS is beeping, we didn't pay all that money out for c**p, get it fixed *NOW*". Logging into a server and querying the UPS revealed that the UPS was overloaded - so called customer back.

    "Has anyone done anything in the server room ?"

    Wait for it ... They have an upstairs that's not been "done out" as office space yet (new build, allowing for expansion). They had a large meeting and a few people were working temporarily in this "store space". Only they were cold, so plugged a fan heater into the nearest socket they could find - which was fed from the UPS.

    We never got an apology, and that issue was "glossed over" in their monthly "what IT issues did we have" document.

    And somewhere I used to work years ago, we had a power cut, so ran a long extension lead out to the back and used the portable genny the landlord has. I'm working in server room, and the UPS fans (old basic UPS) are changing note, and the unit is clanking a bit as breakers go in and out. Go out back, and landlord had some maintenance to do - so had just plugged the angle grinder into the generator and intermittently overloading it. Amazingly the server didn't crash !

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like