Show me the POWER!!!
(And give me something with which to cool it.)
Google is the modern data poster-child for parallel computing. It's famous for splintering enormous calculations into tiny pieces that can then be processed across an epic network of machines. But when it comes to spreading workloads across multi-core processors, the company has called for a certain amount of restraint. With a …
(And give me something with which to cool it.)
The workload should match the architecture?
Does the Pope shit in the woods?
>>We can only assume that Google prefers Intel Xeons to AMD Operons
Well, the difference isn't that great between Intel and AMD so I'm not so sue about the assumption, my guess is that they are blending the concept of cores with threads and architectures such as the T2 which are mult-thread multi-core, so you (effectively, as the OS sees) up to 64 cores per CPU, but the clock speed drops. The servers that they use it in such as the T2000 are fantastic for webserver and MySQL when there's lots of threaded processes but not so good for big number crunchers or software optimised for <20 core (such as Oracle data warehouse), this is the very reason that Sun (Oracle) have both types of architecture, choose the wrong platform for your workload (as management bean-counters who listen to sales reps do) and you'll leave the admins scratching their heads saying, "Why did you buy this?".
The other reason that I think the Intel/AMD comparison is erronious is the the ability to place a hypervisor on a multi-core frame, you can end up with multiple machines in the same footprint for less power, something that's not mentioned in the article.
In the end.. IBM is the only firm that can supply the chips with the power and integration that Google needs.. Well IMHO anyway..
They could also help their OS development too...
Does this mean IBM is going back to pushing the Cell architecture?
While this may seem like a joke... its not.
"We can only assume that Google prefers Intel Xeons to AMD Operons."
Nice bit of trolling there :)
This sounds like Google are helping their supplier of CPU chips in a PR move against ARM that most likely then helps Google keep CPU costs down.
Google : "Chips that spread workloads across more energy efficient but slower cores, he says, may not be preferable to chips with faster but power hungry cores."
In other words, ARM cores. Yet ARM cores of 2.5Ghz are very capable, not least of which when they will have up to 16 cores yet run on only a few watts of power!, like the soon to be released Cortex A15 range of ARM processors. These are a serious threat to AMD and Intel.
Also I don't totally buy into Amdahl's law because that was back in a time when mainframe manufacturers were trying to justify their mainframe costs (and their existence) against what they could foresee was the threat of small networks of small computers, which did end up wiping them out. Therefore these "power hungry cores" the mainframes were killed off by the "more energy efficient but slower cores".
For example that link to Amdahl's law, says released in AFIPS spring joint computer conference 1967 along with IBM's name on it. Its a IBM sponsored report. Its a PR move trying to justify their existence and we are getting a replay of this kind of battle 40 years later, now this time between the very low power CPUs like ARM vs the power hungry x86 based cores.
You don't have to be a genius to work out that if you've got a limited amount of memory bandwidth and/or a limited amount of heatsinking per socket (or per system), most workloads would get more benefit out of a single higher performance processor than they would out of multiple lower performance processors with the same memory and thermal constraints. That's been a well understood fact ever since real OSes supported symmetric multiprocessing (1980s?), but since most Wintel folk generally don't understand real-computer real-OS concepts it's been convenient to overlook this while clock speeds have been going up.
But it's now several years since Intel hit the clock speed brick wall, they (perhaps understandably) haven't really got clock speeds any higher for a few years now.
Instead, they have to kid the market that multicore buys the customer some benefit, and being as they're Intel, very few people are prepared to stand up and challenge Intel's ridiculous claims.
So, thanks to Google for bringing this up again.
Amdahl's law has said about 6 CPUs is best for decades. We see this even on some of the fastest 8 core systems today. There are exceptions like if your busy chasing non cached pointers every every few cycles like poorly written Java on a SPARC T CPU but most well written code runs fastest on systems with 6 CPUs.
out of a 24-core system (4 sockets, 6 core opteron in each). Speed up of 17 on 24 cores, with indications that more would still be quite efficient. Certain other operators actually achieved 22x speed-up on 24 cores (i.e. near linear).
Horses for courses
... well more an RT executive, truth be told.
Whilst Amdahl's Law cannot be revoked, our intended application (neuroscience) has P=1 (ie everything is parallellizable). And hence, (modulo interconnect performance) a million chips runs a million times faster. By Amdahl's Law. Cool eh?
An 18 core ARM chip running at 233MHz? Pretty wimpy. (But can be powered from the USB port)
A million of them, with sufficient interconnect? Now that's what I call interesting ...
While parallelisation can benefit a great many applications in general, there are also a great many algorythms which simply have to be serially processed, due to the next part relying on results from the prior code
Plus the greater the number of processes/threads, the greater the overhead of managing them becomes.
Ray tracing is a classic example. While chucking cores at a single image being rendered will greatly increase speed, you quickly hit a wall, due to each pixels colour being dependant on its' adjacent pixels (and further), due to anti-aliasing, specular flare, and other physical effects that simply aren't known until the other nearby pixels have been calculated.
In this scenario, 2,4, or even 8 cores will produce dramatic benefit when you split the image up into chunks and render each simultaniously, but each chunk then has to be 'stiched' together at the seams, and this generates more work than if the overall image had been done in one peice. Still a massive benefit overall.
But what happens when you have a 1:1 ratio of pixel to processor? The render re-hashing would be horrific, not just you the system, but also for the poor programmer who would have to figure out how to code for every pixel adjusting itelf to its adjacent, while that pixel is also still adjusting?
...when you read, "Google ops czar", and think, "...and kickbans emperor."
For example, rather than 6 wimpy cores, how about 1 butch core and 4 wimpy cores?
Of course operating systems would need work to allocate threads to the most appropriate core. That's already a problem in a way, since if the O/S puts two processor-hungry threads on the same core on a hyperthreading CPU, and leaves the other cores idle, it obviously gives far less performance than you'd get on a non-hyperthreading but otherwise equivalent CPU where that can't happen - I have no idea if any O/Ss are doing anything about that yet.
Still, if operating systems allow for it, it's a good system - a lot of systems will have a small number of particularly CPU-hungry threads, but many more much less demanding threads. Even that Amdahl chappy should be kept happy, since the inherently serial chunk of a workload has a logical home.
Having multiple identical cores per CPU is convenient in many ways (processor design, scheduling threads etc), but not IMO the best way to handle most real-world workloads.
I thought this was pretty much common knowledge, as Cray himself explained that it is better to have 2 oxen than 1024 chickens pulling your stuff. The entire supercomputing industry didn't heed his warning and went down the parallelizing fury. This means that instead of having specialized processors, most of today's "supercomputers" are nothing more than heaps of inefficient x86 chips. Alas, the 1024 chickens that Cray warned us about.
However, more specialized hardware like the CellBE does have real advantages on multi-processing. I think that Google is actually looking for IBM iron, not Craptel/AMD stuff.