Google has hit out at storage, memory and networking equipment makers with a grimace, a finger wag and a closing wallet. Two of the ad broker's leading data center researchers have published a paper chastising all of the aforementioned groups of hardware makers for failing to cater to the real needs of customers. Unlike chip …
Manage the hardware effectively
Instead of 10% to 50% usage, why not compress the load so that a set of servers runs at 75% to 90% usage and the rest really can spin down? Then the efficiency is achieved with only a different form of load distribution. When the load gets too high, the other servers are brought up from idle.
..because bringing a *machine* up from idle would take even longer than bring a disc up from idle. The article mentions "submillisecond" idles, which basically means highly erratic loads, where the machine is either "doing nothing" or "full out processing" while handling jobs taking as little as 10-100s of microseconds each. Sounds like database queries and on-demand web service apps to me. They want to 'sleep' components for the sub-millisecond gabs between these queries, to save power.
Its not just component manufacturors that dont keep up
Try getting some rackspace. Your typical 42U rack comes with a measly 10A power allowance - try running a full rack on that! If we fill our racks how we'd like, we'd draw about 25A, so instead we run them about 2/3 full and pay extra for 15A on each.
Note we couldn't get above 15 A, they insist that thats all the cooling allowed for!
Does Mr Barroso (and even Mr Vance) have any idea how computers work?
Using Mr Barroso's metrics, core memory is 100% efficient. It uses no power at all while idle. Unfortunately it's hard to come by these days, and if you can get it, it doesn't offer quite the kind of performance that people have been used to for the last, oh thirty years or so.
Meanwhile, the DRAM that makes modern computers work as they do is called DRAM because it is inherently *dynamic* not static; it *must* be actively refreshed, very frequently, even when it is not actively being accessed, or the data evaporates. No big saving there then, not with today's technology anyway.
Disks were and are still mechanical devices. They're either up to speed, and working really quite nicely thank you (in the core memory days, a foot of 19" rack space got you 2.5MB , these days it gets a bit more than that while using very roughly the same number of kW), or they're not up to speed and relatively useless for a while. A few drives today have "quiet" modes which are probably also slightly lower power modes, but on the whole there's not much of a saving to be had unless you can shut disk drives down for significant periods of time and tolerate a wait when they wake up again.
Flash memory? That's got potential for disk-type storage as far as low power, moderate speed applications are concerned, but in a real read/write application there's a serious worry about wearing them out due to write cycle limitations (ReadyBoost users please note).
Processors? Easy, already done, not much more to do there, is there?
Well maybe there is. Northbridge and southbridge run hot. If you class the northbridge and southbridge as interconnects for the processor, and redesign them (eliminate them) by coupling the high power high performance peripherals more closely with the processor, there may be some savings there. What could we use for that? Something like HyperTransport maybe for peripherals, except there's one big name vendor that doesn't seem to like HyperTransport much, fwhich may be a bit limiting for the wider market.
Memory is a bit more of a problem: as already noted, DRAM consumes power or it forgets, but perhaps if you want the interconnect to use less power, you could use less pins for it. Oops RAMBUS get upset if you do that without asking nicely 'cos they claim to hold the patents.
I reckon the first reply to this article, the one that says manage your workload to match the available capacity, and manage the capacity as required by bringing servers in and out, has it about right for the short term. If you accept that punters may occasionally see brief delays (which they'll happily blame on Windoze or t'Internerd), and occasionally a bit of power may be wasted unnecessarily, you can bring servers in and out of the server farm to match the offered workload reasonably quickly, and you don't have to break any laws of physics to do so. You might need some clever software but Google are good at that anyway aren't they?
So, some nice concepts, Mr Google, but some connection with physical reality would have been useful too.
why servers aren't managed to run at 90%
When a load spike occurs at 90% utilization, the server will probably go to its knees and thrash, causing slow response times at best and triggering timeouts and failovers at worst.
When a load spike occurs at maybe 60% utilization or less, the slack capacity can often take it in stride with little more than a slight hit in response time.
Of course, a non-spikey load would allow running at higher % utilizations, but that's only feasible with carefully-distributed units of work that are ideally small relative to individual server capacity.
Jeez, it's like yesterday when I had 128K of core memory, and a couple of (?)RL11 disks - 10 megs each with a 1 miute spinup time. PDP11.
Wife used to complain when I fired up the disks, 'cos the lights in the house dimmed.
My two cents
If the Google guys are so smart, why don't THEY design lower-power hardware? I'm not a hardware engineer, but here's my two cents anyway:
Hard drives -- What consumes the most power in a hard drive? The motor to keep the platters spinning. Without the platters spinning, the drives are useless. And it takes a relatively long time to go from stopped to spinning (not to mention each transition cuts down on the remaining life due to wear).
Memory -- DRAM needs to be constantly refreshed or it "forgets" the data it was holding. There's no way around this. You might be able to use SRAM, but it's expensive, and it may actually be slower than DRAM at this point (haven't read much about SRAM in many years).
Networking -- You might be able to do something here, but keep in mind that latency is a killer. And if transitions to/from a lower-power mode result in lost packets or require resent packets, there may be a net loss instead of a net gain because of additional work for the device sending data and for the switches, etc passing the data.
What about audio? I wouldn't mind shutting down my audio circuity once I turn off my speakers.
One area nobody seems to mention (that I've noticed, at least) is video. In a data center, video is almost never used. And consider how much time your home or office computer sits with nobody paying attention to the video. Yes, we have power saving modes for the video output (the monitors as well as the video signal to the monitors). But what about power-saving features on video cards? We have high-powered graphics cards spending a lot of time in non-intensive graphic modes (typical O/S GUI). There's no reason to have 128MB or 256MB of graphics memory used/available when doing standard office work (with possible exceptions for CAD, etc). You also don't need a 500MHz graphics processor for standard office work. And considering that graphics cards are probably the second or third largest power consumer in a computer, it might be a good idea to see what can be done here. I imagine this is the reason some of the Tyan server boards still come with the ATI Rage XL 8MB video chipset.
The tyan boards come with decrepit old chipsets because they are cheap and no one cares other than the 20-30 minutes spent on its first bootup verifying the installed image and then maybe once or twice more in its entire life for maintenence.
Maybe I'm being thick...
The platters in a hard drive are a lot heavier than the heads, right? So why not put the heads on a spinning assembly and fix the platters in place. Result: faster spinup, lower power usage in normal opperation, and probably longer life due to less load on the motors. Even reduced production costs, maybe.
I could be wrong. Google: try it yourself :)
You don't need to refresh the DRAM if the memory’s not being used. The trick would be figuring out which memory is unused, or could be swapped out to save power. Since memory controllers are (becoming) integrated into the processors, that's not so infeasible. (And of course, libc needs to be capable of deallocating core when pages in the heap become empy.)
And why shouldn't hard disks be capable of working at 50% speed? Why should they have to go flat out, every time I want to write a line to a logfile.
You used to have a PDP11 at home?
..Or wife at work?
Either way it seems to me an unusual configuration. What about interoperability ;-)
(PDP 11/23 - 2RL02 - 8 inch 20Mb HD - 128Kbyte, DeQna and all gloriously kept running at home)
I do like your train of thought, but I'm not sure you've quite got there yet.
You've easily replaced the rotation of the disk and got some advantages out of it. Now all you need to do is find an equivalent of moving the heads in and out (ie the equivalent of seeking to a particular cylinder). If you're doing this in a similar kind of way to today's drives, this mechanism needs to be able to move the array of read/write heads accurately and quickly in and out while rotating at 7200rpm for most IDE drives or over twice that for high-end SCSI drives. That's not going to be easy, but modern hard drives aren't easy in any case; they're miraculous pieces of modern science and engineering, it's a marvel that they work at all.
variable speed discs.
> And why shouldn't hard disks be capable of working at 50% speed?
One problem might be that the disc heads fly on a cushion of air above the platter using a ground effect. If you change the speed of rotation you'd get a different flight path. This in turn would effect the area the drive reads and writes, so you'd end up with different sized bits depending on the rotation speed.
> Why should they have to go flat out, every time I want to write a line to a logfile.
When you say log file here, what do you have in mind?
If you mean something like a syslog file then fine, if you mean writing to a database's log file, then you are talking about a synchronous IO operation. The application is waiting for that IO to complete. That is why servers customers are prepared to fork out a large fortune for 15Krpm drives which are smaller and far more expensive than SATA drives. The SCSI or FC 15K discs are just so much faster latency wise and when you are waiting for the IO to complete that's what counts.
OK for writes you can usually loose that delay by using caching in a disc array. But a) that adds more power hungry hardware, and b) it doesn't usually help reads unless you've got your entire working dataset in the cache.
I'm pretty sure this isn't just us, but a lot of our servers don't have graphics cards/integrated chips at all. There's remote administration via some terminal window/remote desktop jobbie or a serial port on the back of the box itself if all goes to pot and you can connect a terminal window through that.
I agree graphics cards are unnecessary but i'm pretty sure a lot of server vendors have thought of that one already...
If Google is so smart
The data centres *are* the problem. Google is building data centres of heroic size to index the 98% total crap of what the Internet is composed of. There is also the hefty resource is devoted to spewing out ads on almost any page you might visit. Then there's Googlemail which seems to be built on the premise of never deleting any messages, ever. Given how poor storage density is relative to such an awesome task we can expect to see entire data centres built to house the petabytes of data containing trillions of spam messages, Amazon mail from ten years ago, ancient newsletters you never read and all kinds of other worthless crap there is something inherently unintelligent about doing this. People like Google are the problem as they seem not to believe in any value in discriminating about the data we do keep. The hardware is fine, it's the premise that's wrong.
I recently bought a pair of 500GB hard drives and read up the reviews on several models.
Instead of going for the fastest model, I decided to go for the ones that drew the least power in the idle state. figuring that the drives would be running nearly continously for the next 3 - 5 years (hopefully). the drives also had the lowest running temp.
Interestingly the drives under heavy read/write wern't the most energy efficient.
So it goes to show that some manufactures are trying to improve the green credentials a bit.
DRAM and drives
DRAM can be refreshed at a lower rate. It depends on the leakiness of the memory cells. Old low-density device could last for several seconds , so there is scope for ultra-low refresh powers
The energy required to spin the platters in a hard drive is due to the friction on the spindle. There are various technologies for improving the spindle bearings, but there's the air drag on the platters themselves. Heads already have careful aerodynamics to make them fly close enough to the platters at the high surface speeds of 7200rpm drives, so there's no need to have air at full atmospheric pressure in there. If the drive casing was airtight, reducing the platter chamber pressure would cut this drag.
@ Dr Mouse
There is a minor problem with the balance of the head as it spun. Think of it like balancing a tire on a car.
If monitoring my PC's power consumption is any indication, the graphics card uses power variably. I've noticed changes in consumption between just having the desktop / office applications running versus running an FPS or MMORG game. I'll have to watch and see if the consumption drops further when the screen blanks (power meter is only hooked up to the PC, not the monitor).
How about multispeed drives?
It would be nice to have laptop drives that spin at one speed when on batteries and a faster speed when hooked to outside power instead of these pokey old 5200 rpm drives.
Multiple heads for multiple speeds
I think the "spin the heads in stead of the platters" idea has some merit on paper, but in practical applications it is extremely difficult because the forces on the bearing will vary unless the rotating head is very well ballanced (so moving the head in and out would effectively require adding a counterweight moving in a similar pattern to ballance the load on the bearing).
To compensate for the potential problems in getting the head to "fly" at lower speeds (is this really such a big problem? The speed already varies depending on the distance form the axis), more heads with different profiles could be used - or the shape of the head could be adjustable with speed.
On a sideissue I have recently wondered whether it would be possible to implement RAID 0 on a multiplatter Harddisk (with each platter working as a seperate disk)? it doesn't have any relation to the discussion at hand, but neither does Paris.
implements a low-speed idle (4500 rpm instead of the usual 7200)
intel are a major cause
despite the latest core2duo/quad devices, if you want to run multiple processors it means zeons with FB-DIMM - and these run very hot and eat electricity! We have a some dual-CPU/quad-core boxes (8 cores in total) with 8GB RAM and they run very hot and eat loads of power.
If Intel adopted a better bus technology (which in theory they will do - a slight revamp of Hypertransport) they could make multi-processor multi-core systems much more efficient and get rid of FBDIMMs
The huge FSBs that Intel push for are also very inefficient - power is mainly lost in the switching of logic states.
Back in the old days...
...I remember doing sysadmin stuff on VAX and AXP VMS. We never had dedicated servers but would set internal quotas on CPU, IO, Memory, etc, according to the specification of the project / software.
I remember some of our servers had alerts if the CPU went below 80%, as that would mean something had broken...
I am not in sysadmin any more but I still apply the same capacity planning rigour to deployments now - even though it is very hard on bloatware like Windows and not easy on Linux...
The research is correct but the current "Throw another server at it" attitude is good for the tin vendors and also good for OS vendors. They have no incentive to change.
Just getting my coat...
Server hard drives
It is possible to build server-class hard drives that take less power but still give the same throughput performance and disc capacity as current designs, by using multiple head sets and actuators. With, say, four independent head sets spaced equally around the platters it would be possible to get 15kRPM performance out of a drive running at only 5400rpm. This would dramatically reduce the power requirements per drive.
Drives like this would, of course, be quite costly and bulkier than the standard 3.5" form factor but the cooling and electricity cost savings over their expected service life might make up for that. They would unfortunately be more fragile though, having more moving parts to go wrong.
I expect we will see more flash-based SSD data storage rolling out into the datacentres as flash chip prices fall -- the "maximum number of writes" problem can be ameliorated by existing write-spreading algorithms and smart caching in DRAM plus preventative maintenance. Sadly, the extra storage capability per watt this will offer will inevitably be eaten up by more and more data storage requirements in the same rackspace.
- Review Samsung Galaxy Note 8: Proof the pen is mightier?
- Nuke plants to rely on PDP-11 code UNTIL 2050!
- Spin doctors brazenly fiddle with tiny bits in front of the neighbours
- Game Theory Out with a bang: The Last of Us lets PS3 exit with head held high
- Flash flaw potentially makes every webcam or laptop a PEEPHOLE