When purchasing server processors directly from Intel, Google has insisted on a guarantee that the chips can operate at temperatures five degrees centigrade higher than their standard qualification, according to a former Google employee. This allowed the search giant to maintain higher temperatures within its data centers, the …
Why not turn the cooling fans down?
Unless the chips are passively cooled which I seriously doubt, why not just slow the cpu fans down slightly?
@Why not turn the cooling fans down?
Sorry if I repeat others' comments;
The goal is not to increase the temperature of the CPUs but to reduce the level of air-conditioning (think power bills). To do this, Google allegedly wants Intel to warrant the CPUs for another 5deg higher max running temperature. The higher temp will be a result of increasing the air-con temp (reducing the cooling effect).
More joined up thinking.....
The next stage in 'green' computing considerations would be to build data centres in the north (where it's cold) and site them next to large commercial, retail, or domestic developments. Then in the autumn/winter they can sell the dumped heat to their neighbours; reducing their own and their neighbour's overall costs and consumption of energy.
The cost savings over even a short time should be enough to pay for any high speed data links that may need to be laid.
Taking it further, data centres should be build on the northern coasts so they can use the sea as a coolant fluid (not directly, but using a heat exchanger). This would take care of the situation in the summer when any neighbours wouldn't need the dumped heat. Since all generated heat eventually ends up in the sea anyway this should not be any more of an 'environmental problem' than it already may be.
4% per degree * 5 degrees
If you decided to run your servers 5 degrees warmer than recommended, then presumably you get a higher rate of failure. Ignoring the possibility that Google gets 'better' chips, maybe they just said to Intel -- "At X degrees, p0 percent of chips will fail per year. At X + 5 degrees, p1 percent of chips will fail. Just honor our RMA's on p0 chips, and we'll eat the difference." That means they lose on p1-p0 chips. But if they save 4% per degree on cooling, and 20% of cooling is greater than the cost of losing p1-p0 chips, then why not?
Intel should easily be able to warrant an additional 5 degrees to the maximum operating temperature. I have run various models of Intel CPU over the years at 100% load 24/7/356 in less than ideal situations (>40deg C ambient indoors temp, Inland Australia, where else!) without a single failure.. EVER! sure, the fans are screaming and the dust filters need unclogging from time to time but you'll be surprised how much abuse PC hardware can take.
I can't see why a giant like Google would need the chips certified though, with all the cash they make and the tax deduction they could get on the cost of the CPU you would think replacing the odd statistical failure wouldn't break the bank (Which is a good thing really since the banks have managed to break themselves without outside help).
*Fire, cos it's the worst that can happen!
Chip life vs System life
How long do Google plan on keeping a particular chip-generation in service anyway? If Google only want two good years out of the hardware, instead of three, and any Intel-Google contract reflects that in any guarantees, everyone could be happy.
Does not sound right
If you push an Intel CPU beyond the max temp it goes into thermal throttle. So you actually do not win anything from this as the average performance falls so you need more servers for the same task.
If Intel publicly guarantee a temp 5C higher than current, then everyone will expect to be able to run 5C higher than that, and so failure rates will increase, not looking good for Intel. Therefore they set their temp low to get a maximum yield and minimise returns.
It's the same with ladders. I understand they are usually tested to at least twice the recommended load, but I'd rather work at roof-level on a ladder with an average 50% safety margin than a ladder with an average 10% safety margin!
@ Anton Ivan itchianus
Thermal throttling kicks in at around 100 degC. CPU's are usually specced to run @ around 70-80degC
Running them @ 75-85 therefore would not see them reaching the limit for thermal throttling.
I have my q6600 clocked @3.4GHz per core running @70degC under prime95. (100% load) The latest generation of CPU's are much more robust than previous generations. My old athlon would start copping out @ around 55-60 C.
I am in fact suprised that intel doesn't sell this point more.
I think you'll find that there's some difference between the absolute maximum temperature that an Intel CPU will run at at full chat and the operating temperature range that Intel are prepared to warrant for long-term reliability at high utilisation levels.
Don't Intel offer MilSpec versions of their chips anyway?
That's not right. Thermal throttling is activated at much higher temperatures than the "normal" working-temperature.
@ Anton Ivanov
And hence Google's request for chips that can run +5 degrees...
From a cooling perspective a chip with a 100W/70C heat load needs to be cooled at least to about 50 Celsius above ambient. This puts the thermal resistance to ambient required for cooling at 21C (70F) at 0.5 c/W. If Google simply used better heatsinks they can get this down to 0.4 c/W which means the same chip at 70C dumping to 27C (80F). Achieving 0.4 c/W is not that big or a problem. Bulky desktop heatsinks approach 0.1 c/W.
However just sticking servers in an air conditioned room (box) isn't an efficient way to cool. Hot and cold corridors only work if you limit mixing to the heatsinks.
Thermal throttle is somewhere 70C+
There are millions of computer users in tropical countries where indoor temperature can easily be 35 C and case temperature even higher, these work well with standard cooling solutions without reaching the throttling point.
Paris, because she knows all about getting hot without throttling down.
Our military spec tests run each board with passive cooling and vibration from -40C to +85C to the standard - and we go 20% further than that! (Well, it might go into a helicopter in Iraq or Siberia)
We do use Intel chips amongst others - But we have to qualify them ourselves
Good thermal management is much more than just obtaining chips that can run a bit hotter, or colder!
Helicopter - because I could tell your more but then .... ;-)
Power out must equal power in! Simple as that. It doesn't matter what temperature the chips run at, as long as what is put in is taken out to maintain the temperature. 35W in (electricity) needs 35W out (cooling) to maintain the current temperature, what temerature that is, is irrelevant.
Combined heat and power?
With winter approaching in chilly northern climes there must be more than a few who wouldn't mind being within thermal transport range of a disk farm. Battersea power station incorporated district heating years ago, so it can't be all that difficult.
Hmm, where did you say they are going to build the UK's new £12 billion spy-centre silo?
Thermal Throttle spec
The most efficient power usage would therefore appear to be to allow the server room to be as hot as possible up to the point that the chip heatsinks could not prevent the actual chip temperature climbing beyond around 70 degrees (or whatever) and thereby triggering thermal throttling.
If you ordered special chips that didn't start to throttle back performance until 5 degrees (C?) higher than usual you could allow the room to become hotter. Saving Aircon power usage without loosing performance. Perhaps this is what the Google-Intel deal is about.
It's not that simple. Power consumed by aircon isn't the same as heat energy shifted by the aircon. You can pump 3KW of heat out of a building using only 1KW of power consumed, and the efficiency of that pumping process *will* depend on the temperature differential between hot and cold sides.
Too hot to handle
When I get hot I like to strip-down to my underpants. They should apply the same logic to servers. Yep, just fit them with a massive pair of underpants. Datacentre's with rows and rows of rack-mounted servers, all wearing giant underpants.
If a manufacturer was particulary inventive they could incoporate extra features into the giant underpants - like a slot for your dvd drive, or openings for your hot-swap disks.
It could be the next Apple.
Re: Military Spec
Obviously standards are slipping, -40C/+85C always used to be Industrial spec, Military was/is -55C/+125C.
The industrial range seems a lot but when you've got passive cooling/no heaters you often find it isn't enough. In northern regions cold starts can be from <-40C if you left the kit out overnight. And in the Gulf solar heating + ambient air will take you up near 85C before you even turn it on.
The only problem is that it's much more difficult to find parts rated to the Military range and you have to make the cheaper industrial stuff work because that's all you've got. If you're lucky for the cold regions you'll have enough power for standby heaters, but for cooling it's more difficult - certainly I've had some very narrow operating margins to deal with requiring some interesting passive cooling solutions.
Getting 'special' bits is never really an option, and I doubt anyone would ever really trust something like a nominal 5C rating improvement on specific parts as it just isn't practical to do the qualification particularly if the rating is process related. Easier to just design a cooling solution where you can safely run right up to the standard rating with a high ambient, or accept the reliability trade-off of exceeding the rating if it's cheaper to replace/repair the hardware more often than to either cope with the environment or condition it to suit.
Why don't they go underground?
Once you get down below the frost layer (usually 36" of top soil), you're going to find that the earth's temperature is pretty much constant year round.
So if you build your data center using the earth as a heat sink, and then use other passive cooling techniques, you can run your data center and still reduce cooling costs.
The trick then is to make sure that even below ground, you are above sea level, away from flood prone areas, and that the land around the site is graded away from the building.
There are a couple of other passive things that could be designed in, and if you use solar power (black silicon when it becomes available) you can use this energy to power any auxiliary fans to help with the air flow.
I think that this qualifies as quite the silliest comment on El Reg so far. Bravo, that man!
Why Don't they go undersea?
Drop the containers (waterproofed of course) into the middle of the Atlantic where the temperature approaches Zero C and connect huge cables to a balloon in the upper stratosphere that would harvest wind energy from the jetstream and transmit the data to earth. Power and Cooling all for free. It would also have the added bonus of bringing down airliners mid-Atlantic and thus reducing carbon emissions. Win-Win.
HI, fine writing!
@ Robert Heffernan
"I have run various models of Intel CPU over the years at 100% load 24/7/356"
What about the other 9 (or 10) days of the year?
Flame, because you have been.
Off the wall commentary
1) Running processors hotter does decrease frequency. Device physics. Also can expose speed
paths causing data corruption (in some cases silent data corruption), as well as outright failures,
not to mention the worse case scenario, fire.
2) I fully understand wanting to run data centers hotter, (save $$$'s on electricity bills)
3) What I completely fail to understand is how this fit's in with Google's overall "green" policy
and it's commitment to fighting "global" warming. Unless of course they're just full of hot air
like everyone else. Corporate green policies to me have become nothing more than marketing ploys designed to raise more greenbacks. Remember Google's turning their screen black
for light's out day? (I still can't help but laugh at that).
umm havent googles techies heard about intels clock throttling? - if the cpu gets too hot it lowers its clock frequency to reduce its temperature - all the core2 line will run only warm to the touch with no heatsink present (tho they do slow to about 200Mhz...) and as the latest xeons are essentially based on core 2 all googles overheating plan will create is a lot of slower running chips...
outside air mixing
all else being equal, it wouldn't matter whether you ran the servers at 20C or 25C, you'd still be getting rid of the same heat!
if the building were sealed perfectly insulated so no heat entered or left the building through its walls, floor and ceiling, then only the air-con removes the waste heat.
however, if the outside air temp is cooler than the inside, you don't need coolers, just suck cold outside air in (clean to keep dirt out), and blow out the hot! You can do this in more northern latitudes where air temp is cooler, and so if Google can run their computer rooms at 25C instead of 20, they can make better use of "economisers".
Frank: There's a difference for the local wildlife. Increasing the ambient temperature in the region by 1C could easily cause the local unique <x> to die out. Yes, a lot of the heat does end up in the sea, but not all within 300yds of the shore.
AC: Total power in = total power out, yes. However, the power lost through "normal means" rather than aircon is dependant on the temperature of the datacentre as much as anything else - therefore you can use "passive" cooling for a lot more if the place is warmer. To put it bluntly, you may be able to just use cooling fins and a water supply instead of needing aircon if you can run 20C hotter than ambient.
AC2: Running them hotter decreases the MAXIMUM, not the "real", speed. Clock your new CPU to 1GHz, and warm it up - it'll stay at 1GHz until it breaks.
This isn't rocket science people! (Despite the icon)
Shooting themselves in the foot
All Intel has to do is provide a spec spread where towards the upper thresholds for any given clock frequency, those parts have a higher voltage spec. Voila, a mere 5C higher is attainable even in worst case. All Google has done is fail to see the science in what it requires to meet their spec and how it would ultimiately effect the parts offered under this agreement (if it is true at all, frankly it seems foolish because the CPU is not the most heat vulnerable part in a server if your plan is to allow ambient temp to rise).
Unless they're overclocking, it is rather trivial to slap a stock heatsink on and have it stay cool enough with 80F ambient temps. At stock speeds most of Intel's products would stay cool enough even at 90F ambient unless these servers were ill designed, with especially bad airflow.
Paris, because even if she doesn't know what the "C" in 5C stands for, she understands pushing the limits.