back to article 'We don't use UPS. If we did we'd have huge UPSs and tiny computers'

The heatwave-driven outage at the VLSCI supercomputing facility last week could have been worse than it was, with power cuts also a risk, the facility has confirmed. A senior systems administrator at VLSCI, Chris Samuel, has discussed the outage and the lessons learned with The Register. While the reason for the shutdown was …

COMMENTS

This topic is closed for new posts.
Bronze badge

“That said, we've always been very lucky with power around this area … it might be because of our proximity to [Melbourne] hospitals.”

Maybe - but since the hospitals have their own backup generators in case of power cuts, it wouldn't be sensible to rely on proximity and 'luck'.

3
0
Silver badge

What do UPSes have to do with this?

If the incoming water temperature is too high for their chillers to work, how does the presence or absence of UPSes help or hinder them?

1
0
Silver badge

Re: What do UPSes have to do with this?

The UPS discussion was in relation to possible power cuts. It's in the article.

8
0

Re: What do UPSes have to do with this?

They are two separate issues.

The facility draws so much power that the required UPS would be HUGE. The extreme temperatures, and consequent load on the electricity grid, meant that power cuts were a real possibility.

The actual issue they ran into which led them into shutting down parts of the facility was that the outside temperature was too high for the chillers to work efficiently, and it ended up leaving too much heat in the coolant system, which in turn meant the computers would not be cooled properly and would overheat.

1
0
Bronze badge

Coolers on the roof?

Roofs get really hot in sunny places. Would it make any difference if the coolers were placed in the basement instead? I don't know - I am asking.

2
0

Re: Coolers on the roof?

My guess is they're on the roof because it's easier and cheaper to get the heat away from the air con unit heat sinks because you'd just use the outside air, the issue is if the ambient air temperature gets too high then there isn't much heat exchanged from the air con heat exchangers, which is what appears to have happened in the article.

If the heat exchangers were the basement then you'd have less air to heat up so would hit this problem sooner, and / or you'd have to find some way to get the hot air out of the basement and that could end up being expensive or technically challenging.

7
0

This post has been deleted by its author

Re: Coolers on the roof?

"Roofs get really hot in sunny places. Would it make any difference if the coolers were placed in the basement instead? I don't know - I am asking."

Chillers could be in the basement (I tried to do this on a project without sufficient roof space), but it would involve a lot more piping. But chillers still have to obey the laws of thermodynamics, so the heat has to go some place. Probably cooling towers, which need to be outside and high up. Cooling towers are pretty effective in a dry climates. In fact, from the article it is possible that they meant to say cooling tower, instead of chiller. Or they are using "chiller" to refer to the chiller and cooling tower package.

1
0
Rol
Bronze badge

Re: Coolers on the roof?

A store of dry ice, for the odd emergency, wouldn't go amiss. Or tanks of liquid nitrogen.

If I was running the place and found temperatures were getting critical, I'd have been on the blower requesting deliveries of ice, to pour into the reservoir tanks.

Then again, wasn't the gas fridge a brilliant invention? Use the heat from the racks to feed into an Einstein–Szilard refrigerator, which makes ice and keeps it so until an urgent need for its cooling properties arises.

0
0
Silver badge

Re: Coolers on the roof?

Alternatively don't build your supercomputer in a country famous for being so XXXXing hot

2
0

Re: Coolers on the roof?

Quite possibly a cooling tower(s) is part of the system. This needs to be in the open. Also compressors generate a lot of heat so it is normal practice to put them on roofs, even in hot countries. They can always be shaded by a lightweight panel.

0
0
Bronze badge

Re: Coolers on the roof?

>If the heat exchangers were the basement then you'd have less air to heat up

Depends on how the 'basement' is vented - a well constructed 'basement' - using the same principles that meerkats use to passively ventilate and cool their burrows, could be much more effective than a conventional roof mounted array of cooling units.

0
0
Silver badge

Re: Coolers on the roof?

Huh? Are you saying let's dissipate the heat INSIDE the building you're trying to cool down...?

Sorry, does not compute.

0
0
Silver badge

Argument seems illogical

So, they reckon they'd end up with tiny computers if they had UPSs? That makes no sense. Surely you spec the UPS to match the load? And anyway, in a data centre, a UPS is a 'switch over backup' which holds the fort whilst the backup generators come online.

So, realistically they'd need about 10 minutes of battery power maximum. Even for giant computers, the UPS systems won't be bigger than the computers themselves.

1
3
Silver badge

Re: Argument seems illogical

Batteries? Sounds more like a case for flywheel UPS. Big demand for a very short time to tide them over brief glitches. I've seen a synchrotron facility with > 1MW in flywheel UPS in a small room.

1
0
Silver badge
FAIL

Re: Argument seems illogical

I agree this argument makes little sense. Our UPS room is the size of a single persons office, provides about 5 minutes of power to a few hundreds stock servers, but we have a bloody great generator to kick in within about 30 seconds of power loss.

To me it seems more like, we didn't bother investing in them as we thought we were OK, so cut it from the budget.

5
0

Re: Argument seems illogical

If they have no standby generator the UPS would have to run the cooling system to keep the system at temperature while it was doing the backups.

3
0
Anonymous Coward

Re: Argument seems illogical

"he UPS would have to run the cooling system"

You know that is a more common than you might imagine mistake...

Anon, as it happened in one of our departments: Main power went out, big UPS kept racks powered up but A/C off, racks died (and/or shut down on BIOS command) as room reached around 90C.

6
0
Bronze badge

Re: Argument seems illogical

Or a flywheel assisted generator. (A large flywheel is kept running by a low power system, storing enough energy to start the diesel generator within seconds and provide power in the meantime) I've seen systems that can take over power generation fast enough and with matched phase so the electrical systems wont even notice.

2
0

Re: Argument seems illogical

MAE-West (remember that?) suffered a massive outage in the 90s from a power outage where they got the generator running but it didn't provide aircon and it fried most of the equipment in the room from the heat. Rebuilding the equipment in the facility drained most big router vendors spare parts stores for all of the USA.

(the story I heard was there was a gas leak so the fire dept. killed the power to the street, MAE-West ops people dragged the genset outside the exclusion zone, fired it up, got the NAP running again, but in the summer heat in San Jose, CA, the temps in the room quickly exceeded the operational specs for the routers and switches)

You also never EVER want your aircon on UPS, even if you could size the UPS that big. The motor load from the aircon does nasty things to the inverters in the UPS. You need the UPS sized to carry the compute load for the genset spin up time plus the clean shutdown time of the compute load if the genset fails to fire up. The 1.6 megawatt genset at my last job could spin up from complete stop to carrying load in under 5 seconds from the time it was signalled to start (or so the suppliers claimed - I don't think we set the transfer switch that aggressively)

3
0
Bronze badge

Re: Argument seems illogical

> So, they reckon they'd end up with tiny computers if they had UPSs? That makes no sense. Surely you spec the UPS to match the load?

I took it to mean that in a world of limited budgets they preferred compute power over ups/backup power

5
0
Anonymous Coward

Re: flywheel

Yes. As I understand it, batteries make no sense nowadays for a data centre or a supercomputer even if you can't afford service interruptions. You're better off having a flywheel that can power the site for 20-30 seconds and a couple of redundant diesel generators that can start up in 10-15 seconds. (When one of them is going you can shut down the other.) Such equipment does exist as I have heard of it being used in state-of-the-art data centres.

0
0

Re: Argument seems illogical

The trouble with UPSs they are a little like good back up strategies and systems in that if they are working, people won't notice them. As such, come budget time, they appear to be a (potentially) large expense.

It's only when they aren't there, or they fail when you need them that you notice them..

Having said that, we have a UPS system that is good enough for our needs, as is our backup system (which uses both on and off site backups).. Unfortunately, due to the buildings we occupy being grade 1 listed, it's not really practical to install a generator (or so I have been told).

1
0
Anonymous Coward

Re: Argument seems illogical

MAE-west i split between San Jose and LA. Hot in San jose ? it maybe gets a week above 90 in the summer

0
0
Bronze badge

Would UPS/Auxiliary Power be cost justified?

A more pertinent point is whether the type of work a super computing facility performs merits the expense of a UPS and auxiliary power supplies. After all, by its very nature, compute-intensive work is essentially batch-based and, arguably, the money that would be spent on UPS, auxiliary power and the required accommodation is better invested in more powerful computers. That way, even if there is the odd outage of a day or two it's likely more work can be carried out. Of course, there is still the staff cost during the outage, but even if they can't find other work to do, the more powerful computers should increase their productivity during normal working.

Note, this is a very different model to mission-critical transactional systems where a business (and customers) can be heavily disrupted, or even stopped by an outage. In that case full UPS and auxiliary power is likely to be justified due to the disruption.

6
0
Bronze badge

Re: Would UPS/Auxiliary Power be cost justified?

The issue isn't so much the length of an outage but the problems arising from a dirty shutdown. As people have pointed out the main requirement is to provide sufficient power to either enable generators to kick in or to effect a clean shutdown.

Obviously, it is a business decision as to how long they wish to operate the datacentre when there is no power to the datacentre, offices and local area (so no comm's).

0
0
JLH

HPC systems

Remember - these are HPC systems. Hi Chris!

Typically they draw a large amount of power per rack - but jobs can be halted and checkpointed if you need to turn the system off. It is not critical that they are up 100% of the time.

(Making that clear - it is GOOD that they are up as close to 100% but its not business critical and jobs can sit waiting in the queue to run later).

On an HPC system you woudl tend to be more concerned about having UPS for your storage and head nodes (login / provisioning nodes).

That said, a UPS does give you power smoothing, so for that reason there are UPSes on all nodes on the systems I look after. However we don't expect a long runtime - there is sufficient time to checkcpoint jobs and shed the load by switching compue blades off.

5
0
Boffin

All they need is an HV Drups on the incoming supply XD would easily handle both the hpc load and the cooling load if the power failed XD these beasts kick in to diesel backed geni in no time XD and they keep the incoming supply clean and its a no load break

Just done a job recently for both hpc and its associated storage

0
0
Silver badge

Was their cooling/UPS decision based.....

.... more on the fact they are not handling some corporations accounts or sales system?

If there are no contractual reasons for 99.999% uptime them why would you build a system to support that.

If It goes down it goes down. No one dies. No lawsuits arrive.

0
0

Luck wont protect anyone frrom stupidity..

or maybe the 'victorian' eggheads dont have a problem with sudden outage, interupted computation or loss of data. They can always go to the beach!

0
1
JLH

"or maybe the 'victorian' eggheads dont have a problem with sudden outage, interupted computation or loss of data. They can always go to the beach!"

Look at my comment re. UPS for the storage - that is very desirable and yes data corruption is not at all wanted.

But re. sudden outage, HPC jobs can and will have this. The job should write a checkpoint solution every so often (*) and could be re-run from the last checkpoint if it fails.

These workloads consist of simulations - if one of the blades running the simulation fails, the whole run is likely to stop anyway.

(*) that is an interesting problem in itself -and is one of the reasons HPC likes big fast storage.

0
0
Silver badge

What about storing coldness in liquid nitrogen?

Just for the UPS time, squirt a bunch of liquid nitrogen/similar into the server room - might keep the machines cool enough to get the gennies running

0
0
Silver badge

Re: What about storing coldness in liquid nitrogen?

Then the ensuing condensation would blow the crap out of everything in there.

1
0
Silver badge

Re: What about storing coldness in liquid nitrogen?

There's a reason that nitrogen-cooled freezer wagons have instructions to leave the doors open for 30 minutes before going inside. The first thing you'd notice about being in a nitrogen atmosphere is when you faceplant into the floor from oxygen deprivation.

That said, dumping nitrogen into the coolant reservoir might be an idea for an emergency "we need 120 seconds to shut everything down nicely" solution.

0
0
JLH

Re: What about storing coldness in liquid nitrogen?

"That said, dumping nitrogen into the coolant reservoir might be an idea for an emergency "we need 120 seconds to shut everything down nicely" solution."

Good idea.

But you should have some sort of thermal monitoring anyway - hopefully shutting down automatically when the temperatures rise above a set threshold.

That's where old style mainframe 'halls' were good - high ceilings, lots of thermal mass.

BTW, Trox in the UK already do produce cooled doors cooled by CO2.

0
0
Mushroom

Why keep it running when there's no power

It's only a super computer.

Taking the costs of putting in UPS/generator systems to carry through a power cut of indeterminate length vs the costs of down time I suspect they decided it was not worth it.

Uptime requirements vary enormously...

0
0
Bronze badge

Even if they lose power for an average of over 3 days per year, which would be a pretty lousy power supply, that's 1% of their capacity lost. Could you really provide backup power - UPS+generator or flywheel+generator - to a computer for less than 1% of its purchase price? I doubt it.

For non-urgent batch jobs like most HPC, you get better overall throughput by shutting down for the power outage and putting your UPS budget into buying some more compute nodes to be faster the other 99% of the time.

Of course for any real time service like a bank or ISP, stamping out that last 1% of downtime each year is worth spending a lot of extra money on: being offline 1% of the time will cost you a lot more than 1% of your revenue. For batch work like this, though, just work slightly faster the other 99% of the time and you come out ahead overall.

4
0
Silver badge

The is the el-reg - we don't allow these sort of clearly explained well reasoned comments here.

Can you add something bashing apple fanbois and Microsoft's purchase of Nokia?

1
0

I once worked in a facility where there was only enough UPS capacity for little more than half the heavy load systems but there was generator capacity for everything. That was fine operationally because as long as half the system worked all the time resilience was maintained. However it did cause secondary problems that when the systems were shut-down improperly, the abrupt lack of cooling could impact their lifespan. Fortunately it didn't happen enough to make it a real problem for us, so we accepted the rare uncontrolled shut-down and recovered later. The cost of adding sufficient additional UPS capacity would have been significant.

0
0

Piggybacking on the hospitals supply doesn't sound a very moral position

“That said, we've always been very lucky with power around this area … it might be because of our proximity to [Melbourne] hospitals.”

Whether or not they have their own generators, the power company is going to be unwilling to cut the the hospitals off and clearly VLSCI are playing on this.

They should be paying for their own supply insurance if they need it, which clearly they do. The UPSs or whatever don't actually need to be sited right next to the machines, but the should be there and cooled. And the power company should be investing in a separate circuit so that the VLSCI can be cut off if that's what needs to happen.

0
0
Silver badge

Re: Piggybacking on the hospitals supply doesn't sound a very moral position

That's probably a jokey comment rather than a real factor.

It's often how it goes though. I used to live out in a rural village and power cuts were quite regular happenings. I then moved to the city centre and I've had one power cut in 17 years. Infrastructure is everything.

0
0

Re: Piggybacking on the hospitals supply doesn't sound a very moral position

Electricity suppliers build networks with multiple redundant connections to hospitals so that if a localised fault occurs the power remains on. So yes if you want stable power, locate yourself close to a hospital (or other critical infrastructure).

Basing yourself in a rural village with a single cable to the outside world is asking for trouble.

0
0
This topic is closed for new posts.

Forums