back to article *Thunk* No worries, the UPS should spin up. Oh cool, it's in bypass mode

Whatever can go wrong will go wrong. It's a law most IT people would understand and perhaps even fear. It was my third day as the new network manager for a reasonably sized estate across several sites, most inhabited by weirdy beardies who had jobs like counting bats, frogs and other animals you may never have heard of. It …

Silver badge

Re: All your eggs in one voip

"There has got to be a joke in caressing a router within a cave, I just can't put my finger on it..."

You're not caressing it right. Use all your fingers, but very gently.

And say "I love you"....and maybe promise it something.

4
0
Silver badge
FAIL

It was my third day as the new network manager

You took this job without inquiring into such niggly little things like disaster scenarios and such BEFOREHAND

4
26
Silver badge

My one win over beancounters

In gov't service...

Whole rack full of Best(TM) UPS units with failed lead-acid batteries inside. Spent over one year fighting beancounters over purchasing replacements; the beancounters kept using "OMG!! They contain lead! Panic immediately! Oh, dear - the Californicators will all die of lead poisoning!!" As their excuse for inaction. Power failures and lost data? Oh, heck yeah. Did the multiple system failures help with the purchase? An emphatic "no".

So what I did was work with the vendor to create a new part number for something called a "self-contained DC power supply". Turns out that anything flagged as a battery is on the USA's "No buy" list. But SCDCPS? Good to go! That's how I became - at least until my next screwup - hero of the team.

49
0

Re: My one win over beancounters

We had some servers on the landlord's UPSen in the cabinets. We slowly got our management to mirgate to our own UPS.

That was 2 years ago. The big landlord owned UPSen are still to this day showing failed battery, and the only work carried out has been to turn off the failed battery alarm...

14
0
Anonymous Coward

Re: My one win over beancounters

"self-contained DC power supply". Turns out that anything flagged as a battery is on the USA's "No buy" list. But SCDCPS? Good to go!

Always good to be creative. At a previous job, internal policy was that books had to be ordered via, and borrowed from, the central company library (which was at the other side of the country). Manuals, on the other hand, just required a simple PO. We bought lots of programming manuals, database manuals,...

I also remember someone who, following a major lightning strike, facetiously filled in a damage from with "cause: EMP", and then to general surprise it was discovered that they were insured against EMP but not against lightning strikes.

29
0
Silver badge

Re: My one win over beancounters

Insured against EMP? Nice!! I've got to go check to see if that applies. Be good to know.

For whatever it's worth my homeowner's policy specifically excludes damage caused by nuclear war. Some how I think my homeowner's cover is the least of my worries at that point.

24
0
Silver badge

Re: My one win over beancounters

Our big Eaton UPS is complaining about the batteries being old, so soon enough I'm going to have to replace them (fortunately my boss has already given me the go-ahead to spend money).

Theoretically I can put the UPS in bypass mode and just swap the batteries.

Practically however I'm expecting the entire UPS to shit the bed and take down all the power in the server room. Sods law, if I prepare by shutting everything down, then the swap will go perfectly.

26
1
Bronze badge

Creative Purchase Orders

When New Scientist was worth reading, there was an interesting set of miscellaneous stories which included creative purchase order descriptions.

One laboratory got away with ordering a new Digital Signal Generator, for a non-trivial amount of money.

.

..

...

....

.....

......

.......

........

.........

..........

It was a piano.

23
0
Silver badge

Re: Creative Purchase Orders

At a place I worked, many years ago, a salutory tale was recounted to me. It was of a time when company beancounters decided to place severe restrictions on capital expenditure. A bit of a problem since the various departments had a lot of test equipment and the older stuff needed to be replaced once it could no longer be repaired and/or brought into calibration. Without working test gear the departments could no longer perform the stuff they needed to do in order to make money.

Non-capital purchases, however, were fine. That year Maplin sold a lot more oscilloscope kits than usual. Factor in the time of a skilled technician or engineer to build the scope and it worked out more expensive, but it kept the beancounters happy (until they balanced the books at the end of the year).

Then there was the time (same company) when the beancounters decided cost-centre accounting was the big thing. They'd long had cost-centre accounting, in that they kept track, but now all cost centres had to run at a profit. Including the mail room.

Prior to this wonderful idea, the mail room delivered mail (much of it trade mags which people read to look for jobs as well as to learn of technical innovations) to the desk of each recipient. Afterwards, the mail for a department got dumped on a table and you had to look through it to find your own. You don't need to be familiar with Knuth's Sorting and Searching to realize how inefficient that was. But it was egalitarian: everyone, including the Chief Engineer of the department, had to search through the pile. Hooray! The mail department had cut its costs by sacking a couple of juniors who used to distribute mail. The extra costs imposed on every other department, however...

26
0
Silver badge

Re: Creative Purchase Orders

Made a note of that. I'm sure it'll strike a cord somewhere and stave off future problems, so long as I can be sharp about it.

7
0

Re: My one win over beancounters

I am very surprised if your string is all in series. If it is that's your big problem.

If it's not which it almost certainly definitely isn't - just isolate each shelf and replace shelf at a time.

All you lose is some run time but no need to put into bypass.

Of course if you are relying on the register for your ppm you and your company are almost certainly screwed!

1
0
FAIL

Re: My one win over beancounters

Building shutdown & I was advised from on-high that there was no need for me to be on a remote site to do anything (Mistake number one) or indeed putting the very very large UPS into "Passthrough" before they did a controlled shutdown could be done (but wasn't) by the site contact, the fact this was a annual event left me assured that it would just be a routine event (Mistake number two - Never assume it makes a ASS out of U & ME).

Sunday morning 3am messages from the Sysadmin team in India left on my desk phone regarding the servers not remotely waking up didn't get to me for some strange reason of me not sleeping at my normal desk Saturday night\Sunday morning.

On arrival at the remote site Sunday morning & ringing into the bridge as the sound of silence from the servers was deafening, the building power was on but the surge had tripped the UPS breakers & each battery had to be checked by the third party techs, before we got the go ahead to bring up the UPS & then bring everything else back up.

10
0
Silver badge

Re: My one win over beancounters

Nuclear war exemptions have been standard in many British policies since the '60s I worked for a short time in car insurance, Saturday mornings we had to man the phones as the switchboard girls were off, because we were speaking to potential clients we had to have read the T&Cs on the policy.

4
0
Silver badge

Re: the mail for a department got dumped on a table and you had to look through it to find your own.

At the start of my IT career data entry to the major business systems was done by trained typists who could type quickly, easily and accurately. By the end of my career large quantities of that data entry was done by managers who could do none of those at a vastly greater hourly rate...

10
0
Bronze badge

Re: Creative Purchase Orders

I worked a place where a manager wanted to reward his team and bought a soft drink dispenser. To get it past accounting it was officially a colour printer.

They only figured out when the costs of "ink" exploded.

3
1

Re: My one win over beancounters

Putting it in bypass mode is fine. The only issue is that no where in the manuals does it state how to take it out of bypass mode. Once you've changed your batteries you perform a battery test and if it completes successfully it will put the UPS back into ONLINE mode.

This also follows if you have an external bypass switch. After you've done maintenance on the UPS, switch from BYPASS to TEST, power up your UPS and put it into internal bypass. Switch the external switch to ONLINE and then do the battery test. The SMART UPS's need to monitor the load while in internal bypass to be able to supply the correct load when it goes to online. If you put it online and then switch the external switch back to online the UPS will fill its pants and go into overload alarm.

We once sent back a perfectly good UPS because the person responsible did not know the procedure and assumed we had just installed a faulty unit (replacing the old one which had failed)

5
0
Bronze badge
Facepalm

Priorities

Once when recovering services without a plan we asked the business for their priority list. Top of the list was the Management Information System (MIS).

Me: "Are you sure you want MIS back first?"

Senior Manager: "Yes, its critical"

Me: "Really?"

Senior Manager: "Don't question me, its top priority!"

Me: "So you want to be able to report that none of your staff are doing any work rather than not be able to report but know they are doing something?"

Senior Manager: "Maybe Workflow should be first priority then"

33
0
Silver badge
Coat

Re: Priorities

Senior Manager: "Maybe Workflow should be first priority then"

I call BS, no real Senior Manager would admit he was wrong that quickly!

28
0
Anonymous Coward

Re: Priorities

"Top of the list was the Management Information System (MIS)..."

I once did a piece of work for A Large City's transport authority during A Large Sporting Event about five or six years ago. At the time both operational and analytical workloads were running on the same database instances. It was known that, on occasion, big table scans could cause weirdness for those on the operational end of the system, like, say, passengers.

So the database administrator types put their heads together and decided to quietly pause all their MIS and reporting jobs for the two or three weeks of this particularly high profile event, lest the worst occur during this time of high load.

"Did anyone notice?" says I, "You've got twenty people wrangling these reports. Must be important!"

"Did they shite," says they.

11
0
x 7
Silver badge

more than just IT

this essay gives a fair assessment of the status in Lancaster after the 2015 storm.

https://www.lancaster.ac.uk/media/lancaster-university/content-assets/documents/blogs/lancaster-power-cuts-blog.pdf

Its not just IT departments which have to think in terms of disaster survival, but rather the whole of society.

We are all too reliant on technology with no backups

17
0
x 7
Silver badge

Re: more than just IT

more about the Lancaster outage

https://www.raeng.org.uk/publications/reports/living-without-electricity

8
0
Silver badge
Thumb Up

Re: more than just IT

Interesting read, thanks!

3
0
Silver badge

Murphy rules

Those that forget that Murphy rules will be in for a hard time.

8
0

In the old bank, in the vault

I was a customer of a bank in suburban Chicago, and I went into the bank, a new building with lots of glass, and found the power was off, and the tellers were using pen and paper and calculators to process transactions.

I went home, called my mother in law, who had just retired from the bank, and asked her why they did not have a UPS. She said they did, a large diesel, that was in a crate in the vault in the old bank building. It was too large to move through the basement hallway into its planned location. It fit the elevator and the UPS room.

They later excavated a hole in the parking lot, cut the foundation wall, and put it in sideways.

13
0

Re: In the old bank, in the vault

A generator is not a UPS.

A UPS provides carry through time whilst your gennie starts up and can carry the load.

A generator is a generator.

12
2
Anonymous Coward

Re: In the old bank, in the vault

I have been informed of a permanently running generator which acts as a UPS for a large metropolitan university used for keeping both IT systems and cryogenic experiments running in blackouts.

I was then informed what powered this never ending device. A small test, nuclear reactor.... fun times to find you lived not 2 miles away from a reactor.

7
0

Re: In the old bank, in the vault

Now THAT is how you ensure an uninterruptible power supply.

"Power cannot, under any circumstances, be cut to this facility. Do you understand me? Under no circumstances is a power loss acceptable. If there is a power cut, the loss to society, the organization, and science itself will be incalculable. My head will be on the chop just before my boss's head itself gets chopped, and that will be immediately postceding me swinging the axe at your neck. Do you understand?"

"What if there's a flood?"

"No, absolutely unacceptable."

"No, I mean a big, BIG flood."

"Did you watch the movie Deep Impact?"

"Aye."

"Remember that gigantic tsunami that took out New York?"

"I do."

"The rest of our facilities will stand up to that. Make sure the power does, too."

"What if there's a war?"

"Unless someone is dropping cruise missiles directly on our heads, the rest of the facilities will hold. The power had damn well better, too, ESPECIALLY since national power grids are likely to be targets in war!"

"Godzilla and Cloverfield go Sumo Wrestling through the township?"

"Short of Cloverfield getting suplexed through the roof, the facility will stand. The power supply has to, too."

"So, let me get this straight: not fire, flood nor famine or war can cut the power?"

"Right."

"You don't want to hear any 'Acts of God' clause stuff here, because you want a power supply that will stand up longer than the building itself."

"Now you're catching on."

"In other words, you need uninterrupted power right up to the Godzilla Threshold - basically that unless the power supply failure is not the reason your facilities' work has been terminated, the power supply has to stand up."

"That's my requirements. No interruptions are acceptable."

"Right. Well then, fuck it mate, just fuck it; we're gonna have to install a reactor."

"Make it so."

2
0

it shouldn't take months to sort out a new circuit board

Hold on, didn't you say this was public sector?

9
0
Anonymous Coward

"it shouldn't take months to sort out a new circuit board"

Not only the public sector, work in any company with enough bureaucracy, and things like that really takes months to procure - you need to find the approved vendors, ask for a quote, have to quote approved, than activate the purchase, wait for n approvals, wait for managers trying to offload it to someone else's budget, meanwhile procurement people change, the new ones find something wrong in one of the approvals, send them back, the one assigned to the task delivers a baby, nobody else takes responsibility for her task while she's away, you find someone who knows someone working there and have a debt with you, she calls her friend which in some "oblique" way approve the purchase, just the quote is no longer valid because too much time had passed...

18
0
Mushroom

Re: "it shouldn't take months to sort out a new circuit board"

IIRC Somerset County Council while I was there, required the generator to come on after a outage - It didn't.

Remedial replacing\refilling fuel or work was carried out & if I recall when they did a test, it then caught fire.

10
0
Anonymous Coward

Re: "it shouldn't take months to sort out a new circuit board"

As someone who has spent most of my life working for various sized organizations of both flavours it seems to me beyond doubt that the public / private bit makes hardly any difference compared to:

a) how big the org is

b) how long it has been around

c) how many times it has been reorganized (including mergers & acquisitions)

7
0
Anonymous Coward

It took six months to get a piece of customer hardware repaired due to a series of cockups between supplier, finance, and admin. Actual time to repair once it was all signed off : two weeks.

4
0
Silver badge
Coat

Re: "it shouldn't take months to sort out a new circuit board"

@The Oncoming Scorn - "when they did a test, it then caught fire"

So it worked according to spec... they ordered an "Emergency Generator", and, when tested, it generated an emergency.

4
0
Silver badge
Pint

Yay, "This Damn War" has returned to life and joins the BOFH. Now we just need "DPM's Diary" to complete the unholy trio.

6
0
Bronze badge
Facepalm

Resistance to have a reliable backup.

I work adhoc for a guy who calls me in on network & server issues. Went to a site he had for 10 months that he didn't have the Domain Administrator password for. Booted of the magic USB stick, changed the password and then looked at the setup. The users were using Outlook and the previous tech were storing their PSTs on the Server. They had some very nice backup software but the backup machine was a 10 year old PC with a single 4TB HDD and no UPS and the power had failed and the backup had been down for ............. 10 months.

It took me 5 months to convince the payer of my invoices that this was not a good backup system coz every time the power went off the backup machine stopped and there was nobody with an clues in that office to know how to press the power button. When he finally went to the owner and suggested installing a NAS with a UPS, the owner complained that it was going to cost too much and he never had to pay so much for the previous tech guys. First we had a copy of the previous company's invoices when we were trying to recover corrupted PST files on the server. Second, this business was purchased for $1,500,000 and this dick was complaining about $700 for a NAS with hardware redundancy and a UPS.

Similar situation at another client. His DOS based leasing software, dated from 1996, could no longer be reinstalled because the original programmer was no longer on the planet and they couldn't generate and activation key for it. Finally upgraded their 9 year old workstation to a real Server, NAS, UPS and an offsite backup. They got hit with Mr Crypto Virus but due to me cocking up the NAS permissions (I was still configuring it all remotely so had done a manual backup the previous night), crypto couldn't touch them and I had them back up and running in 4.5 hours with 2 hours of manual data to re-enter. This person complained all the while I was setting up this small network but 3 months later I received and email that said, "We dodged a bullet there, didn't we." He finally realised the importance of the backup and the minimal cost compared to what it could have cost. The NAS and HDDs cost less than $500 plus $38 for a new UPS battery for the NAS. Just a blip in the great game of life for his $1.4 Million turnover.

16
0
Silver badge
Devil

...not prepared to pay for a single week of handover time.

Uh, oh. Had this when I left a job, and there was actually a one month overlap with the guy supposed to take over. Every attempt from my side to do any sort of meaningful handover was blocked. It was "we have more important things to do", over and over, like a broken record.

When in my new job I received an angry email from my old boss because no-one had clue even what was running where.

I took extra time and care with the wording of my utterly polite reply. --->

18
0
Holmes

Re: ...not prepared to pay for a single week of handover time.

Never had to use this fortunately but "Your right to ask me anything about your systems & procedures ended once I left the company" is my prepared response.

10
0
Anonymous Coward

Re: ...not prepared to pay for a single week of handover time.

The extent to which fools are prepared to shoot themselves in the foot is always depressing. I left a large employer at the beginning of a large reorg, which they were planning to follow with a platform migration. One of the reasons I left was because I was stressed to the roof as a result of being sidelined by fools. They couldn't work out who they wanted me to hand everything over to so I spent the last month doing almost nothing but going through system and DR documentation trying to think of everything and dot every single i and cross every single t on what I thought wasn't bad anyway. Three months after I left I got a enquiry to contract for a company who wanted someone to document all the systems at an unnamed organisation with remarkably similar technology in order to facilitate a platform migration.

We soon established, yes, we were talking about the same place. OK, I said, no problem, have they given you all the system and DR docs, I reckon I can give you everything you need in a couple of days from that working at home. There was silence. Not entirely to my surprise the sideliners decided to veto me as a contractor, but I hope whoever got the short straw did at least get to see all the documentation. It seemed, incidentally, to take them about 5 years to complete a platform migration I reckoned could have been done in 18 months.

4
0

Disasters I have seen (or seen dodged)

Went through two actual data center disasters at one company in San Francisco.

The first was when a construction project across the street drilled into a 16-inch gas main. Gas got sucked into our buildings air intakes and we had an explosive concentration of natural gas in the machine room. It also transpired that residual oils from the gas compression pumps tended to get into the lines...and the oils were contaminated with PCBs.

The second incident was when a water line separated in the intake side of the water distribution unit that supplied the water-cooled IBM mainframes. The mainframe was on the 14th floor. The feed line was 1.5" soldered copper. It was being fed from a chiller and 10,000 gallon holding tank on the roof...of a 45 story building. (That's right: 30 stories of pressure head.) The shut off valves were under the false flooring and no one had ever told the machine operators where they were. The return drain lines ran above the false ceiling on the 13th floor. Some of the water went around the drain lines and the rest ran into the stairwell where it got into the power ducting for half the building, blowing out a 2 story high bus bar (a replacement was sent by air freight from Chicago...the closest place one could be located on short notice). Most of our offices were on the 13th floor and got soaked. There was water damage for 5 floors below us. The cause turned out to be a manufacturing defect in the distribution unit, so that company's insurance got all the bills.

At another company, there were UPS for the systems, backed up by a diesel driven generator. The company did a quarterly test in which they picked a suitable Saturday and pulled the power to verify that (a) the UPS would pick up and carry the load, and (b) the engine would kick in and the generator would take over the load before the batteries went flat.

9
0
Bronze badge

Re: Disasters I have seen (or seen dodged)

At another company, there were UPS for the systems, backed up by a diesel driven generator. The company did a quarterly test in which they picked a suitable Saturday and pulled the power to verify that (a) the UPS would pick up and carry the load, and (b) the engine would kick in and the generator would take over the load before the batteries went flat.

Ahh yes. I may have told this story before...similar set-up UPS, backed up by diesel generators. Local substation servicing said data-centre (only) decided to blow a phase on Saturday morning. UPS performed flawlessly. Diesel generators started up with no problem. All looking nice and dandy, until electricity supply company people talk to the operations manager and say how long the repair of the substation was going to take, working flat out.

The operations manager blanched. He knew (a) the capacity of the diesel tank and (b) how many gallons per minute the generators used. It meant the tanks would be making dry sucking sounds long before the substation was back. So he gets on the phone to his diesel supplier, looking for an emergency delivery. It turned out that such deliveries were not instant, and there was a gap that needed to be bridged.

The method arrived at through desperation was to get hold of a 44 gallon drum and put it upright on the back seat of his Alfasud*. We then hared off to the nearest petrol station and filled it with diesel, and hared back, then siphoned the contents into the generator tank. The round trip time meant we could just keep up with the generator fuel consumption. After the first couple of trips, other members of the operations team turned up, who were deputed to keep doing this until there was enough diesel in the generator tanks to hold over to the expected delivery, plus a reasonable margin.

As I wasn't actually an Ops team member, and there being no crisis for the bits of IT equipment I was responsible for (the Ops team were more than capable of shutting stuff down in a controlled manner, if necessary, and restart it in a controlled manner too.), I could wander off back home. When I came back on Monday, the generators were still running, a large diesel delivery had arrived as promised, and 'all' we had to do was wait for the substation to be handed over so we could go back to mains supply. This was done outside normal working hours in case of problems with the switch back. (There weren't any).

Not long afterwards, the diesel tanks were substantially increased in size.

I don't think he ever got the smell of diesel out of the Alfasud.

*This happened a long time ago.

12
0

Re: Disasters I have seen (or seen dodged)

At less he knew, I was at a site for a DR test and fuel ran out. Seems the two big generators had a smaller startup engine system each with its own tank. Guess which tanks was checked for fuel. The only reason the main tanks had a small amount of fuel was the delivery driver knew the difference of the tanks and the person ordering didn't.

4
0

new site built directly on a flood plain with the IT hardware in the basement.

I think they misunderstood what "planning for a disaster" meant.

6
0

Reminded me of the tube's 'control room flooded with wet concrete' story from 2014

You'd think it'd be a tale of months of disruption but, no, 24 hours later it was all fixed. I'd still like to read about how they did that.

http://www.bbc.co.uk/news/uk-england-london-25873252

5
0
Silver badge

Re: Reminded me of the tube's 'control room flooded with wet concrete' story from 2014

Read the comments. Concrete takes a while to set and you can include an inhibiting agent, in this case sugar - doesn't actually take a lot and with enough it'll stay liquid forever.

After that it's a matter of a shovel, and a whole lot of cleaning.

3
0
Silver badge

showing age again

During the big NE Merkin powerfail in 1965 ? there were stories that the hospitals in NY had their backup generators started by mains powered electric motors. The more things appear to change, the less the stuffups do

6
0
Silver badge
Pint

Correct Risk Analysis regarding UPS

Installing a UPS provides instantaneous back-up power *most* of the time, but not all of the time. Only a fool would assume that the UPS success rate is magically 100%. The percentage depends on the frequency of its use. If you have power failures every single day at 2:00pm, then your UPS will probably work correctly 99+% of the time. But if it's not been called into service for several years, then it's perhaps 50% or 75% when you need it.

And that's the good news. The bad news is that your UPS itself may catch fire, or perhaps gently smolder with acid pouring out onto the floor. UPS themselves can CAUSE their own disasters. Our office building has been evacuated twice "for real" and both were caused by the UPS.

A very wise man once posted the following about home UPS:

You’re concerned about your family’s safety. So you get a guard dog. The dog costs a fortune. It immediately poops on the floor. Then it chews off the entire left side of your Bang and Olufson. It bites the postman’s fingers. It then sleeps through an actual burglary. And finally it eats one of your children.

This is the UPS experience: If they’re not preoccupied with smoldering their lead acid batteries, then they’re busy buzzing and arcing. Then they blow an internal fuse on the output, and your Great American Novel is suddenly lost, again, for the third time. Then there’s an actually power failure (Yay!), so they turn on their patented 387 volt offset square wave, and your PC is instantly corrupted. Meanwhile battery acid squirts out onto the ceiling, again. Then, while you’re out trying to buy a replacement PC, the UPS catches fire and burns your house down.

I’d happily pay $800 to not have one.

9
0
Silver badge

I left a few years later but the last I heard the company had spent several million pounds on a new site built directly on a flood plain with the IT hardware in the basement.

Could be worse. When I started my IT career about a three lifetimes ago the shop I was in had sprinklers in the server room. Even being green to the gills that caught my attention right away. They were there for several years before someone finally convinced the bean counters that the risk of accidentally discharging sprinklers in a room with $4 million worth of IT hardware was worse than whatever the cost of a gas based fire suppression system was.

6
0
Silver badge

Beancounters and Managers

Some interesting debate about whether beancounters or management are to blame which, with respect, is missing the key point.

Yes, we need people to do the sums even in these spreadsheet days. Beancounters do have a role. But they should never, never, never, ever have senior management responsibilities.

A beancounter is like an office cleaner and should be respected and paid for doing the job. But you absolutely do not let them make important decisions.

11
0
Anonymous Coward

Been There, Done that....

I too have suffered a full-on power failure at our head office DC. A massive surge shortly before an outage in the local area blew a fuse on the UPS (which was recently services with a clean bill of health....) t this knackered whole system and the room went very dark and very, very quiet.... Back at my desk, it's fair to say a little bit of wee came out.....

To bring us back to life, we threw the UPS in Bypass, fired up the Generator manually and ran the server room on Generator Power until the UPS was repaired. Despite recovering the core apps in an hour or so, getting all of our services back and a clear monitoring screen took all day and night - but we were still on the Generator

Two days and 800 Litres of Diesel later, UPS was repaired and we were ready to cut back over to Mains Power, which was the single most nerve wracking experience of my professional life - was the UPS going to take the load, were all the 3 phases in sync?

Because the process of going from Mains > UPS > Gene is fully automated, we effectively broke that chain and had to reverse back in a totally un-tested scenario. But..it was either that or shut everything down gracefully and fail over the power (my preference at the time). For a 24/7 operation, the business took the risk and we got away with it after some buttock clenching moments

Funnily enough, shortly after that episode my Capex Budget request for a new UPS was miraculously approved....

9
0
Thumb Up

This is fascinating

I have had three DR situations in my DC life:

1) A planned DR, where the main European DC was shutdown by flipping the main incoming power, forcing the UPS and standby Gen offline, bolting the doors, and shutting down all phones. "This building is on fire and fill burn down. You are all dead. All tapes are destroyed. All documentation is destroyed. Now, lets see if your DR procedures work". I was at the standby site, where we partitioned the mainframes, and cleared a load of disk space. Meanwhile, people hire vans and went to the offsite tape store, someone got a list of emergency contact numbers and started telling people to go to the airport, whilst someone else went to Schipol and found out that if you have a big enough credit on your Amex Black card then sir, that will do nicely, and a chartered jet is available at gate 27 in 1 hour. We had Europe up and running in 32 hours - target was 48. Back in the 80's this was a success.

2) 2 years later, a faulty bus bar in that backup DC arc'd and took out a meter of power distribution. UPS was fine, it took up the load. Standby generator kicked in, and we were all fine. Until we found out that the diesel had waxed, and the wax was now in the cylinder heads. The generator died. We had to hire a standby gen for a month whilst the busbar was fixed. Lesson learnt, and we drained the diesel tank once per year after that. It was an oil company, so you think that they might have known diesel waxed!

3) Wind forward a few years. I get a call from the CIO saying that the computer room has flooded, and could I drive 50 miles to oversee what was happening (he, of course, was unavailable). My panic level went in to the red. It was only getting in to he car that I thought: "hang on, the machine room is on the 4th floor - flooded HOW?". It seems we had water fox fire suppressant and the pressurised pipes had failed, sending jets of water in to the Sequent, the AS/400, a couple of Vaxes and a Tandem. IBM had a team on standby for exactly this situation and got the AS/400 back quickly. The Sequent needed a couple of disks replacing, but no big deal. The Vaxs took a little longer (Digital were not as good as Big Blue on this occasion). Oh, the Tandem? despite the power being out, despite there being water 3 inches deep in the machine room, the Tandem kept processing card payments, without stopping, glitching, or even noticing!

10
0

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Forums

Biting the hand that feeds IT © 1998–2018