Intel's research team has unveiled a 48-core processor that it claims will usher in a new era of "immersive, social, and perceptive" computing by putting datacenter-style integration on a single chip. And, no, it's not the long-awaited CPU-GPU mashup, Larrabee. This processor, formerly code-named Rock Creek and now known by the …
Intel Is Delusional
Seriously, Intel needs to take a close look at IBM's big failure with its much ballyhooed Cell Processor that recently came to naught. I mean, who are they kidding? You don't design a parallel processor and then wait around for the software engineers to come up with a software model for it. This is as ass-backward as you can get. It should, of course, be the other way around:
First you come up with the right parallel software model and then you design a processor to support it. But that would be asking way too much from the Intel's engineers. They have way too much invested in the failed paradigm of the last century to do the right thing. Maybe some unknown startup will see the writing on the wall.
How to Solve the Parallel Programming Crisis:
@Louis - Re: Intel is delusional
I'm sorry, but did you just spout that old fallacy that software engineers can't do parallel programming?
Multi-threaded processes in the server space have been around for over a decade. Multi-process architectures for several decades.
We can do parallel. If *you* can't then that's *your* problem and *you* need to up your game or move out of the software business. The fact that desktop apps and games don't fully take advantage of multiple cores is neither here nor there, the industry has been designing and building industrial/military grade software for multi core machines for long enough.
By the way, I thoroughly disagree with that blog you reference too. The guy has clearly never heard of a thread pool and taking him as any sort of authority is ill advised.
Louis is delusional
You should take a look at the rest of Louis' blog - such as the stuff on how all current physics is worng, inertial movement doesn't exist[*] but proper physics can be found in the bible. Yeah... right.
[*] Presumably, we think it does because some higher power keeps things moving when we throw them.
The blog mentioned is Louis' own.
Don't waste your time, he rejects all sane criticism/abuse and comes raving back for more (check his prior posts). The internet's whack-a-mole.
Louis, please stop being so greedy...
...and pass the bong already.
This is not a product. It never will be a product."
IBM's cell processor was actually a huge success.... blueray won.. wii/xbox360/ps3 all power boxes.....and the next generation playstation is a Power7 variant (the follow-on to cell)
More cores is good....BUT you have to increase the performance of the cores vs. current cores.
SPARCIII to SPARCIV ....less performance per core....huge problem and oracle changed pricing to .75 to reflect the mistake of bad dual core
SPARC64-VI to SPARC64-VII ......only 80% of the performance per core of the prior generation....shows mistake of dual core to quad core without other advances to increase core performance. Too bad Oracle does not care about Fujitsu SPARC or they would have decreased the core factor to .5 also with T2+
....Net....we went to Power for all "Data Centric" applications and anything above 4 cores. Intel and VMWare have poor core performance and too much virtualization overhead.
Cheers from the UK.
48 core chip
Are you sure the xbox 360 is a Power box? Also Power7 is not a follow on to cell, it's a standard processor versus a highly specific parallel engine.
More cores is only good if you can use them, so the point about needing parallel development tools is valid. You also need a shed load of parallel devlopers and they don't exist.
Another way of solving the problem of efficient use of multicore is a product called MCOPt, which claims to manage resources better than the stanard linux OS, so you get more work through multi-ore without re-coding. I know some people in two V-large semi-conductor vendors who say it works very well. www.exludus.com
Yes, the Xbox360 is a PPC box, a 3 core PowerPC codenamed Xenon.
actual computing speed
it's clearly labelled a research project but I'd like to know what it's best current performance is. They say the cores are simplified to in-order instruction issue and probably fewer instruction dispatches (wiki says atom can do 2 issues per core, I'll bet top end processors can do 3 or more) so that's a significant constant slice taken off, then there's undoubtedly smaller caches (can't find spec for this) which will be significant hit, depending on workload of course, and clock speed is also unspecified, probably for good reason. And from their press release "Application software can use this network to quickly pass information directly between cooperating cores in a matter of a few microseconds..." a few microseconds??? ouch!
My guess is that performance can be estimated from the power output and separately, number of transistors. 125 watts corresponds to a decent modern quad-core IIRC, and the number of trannies in my dual core is ~300 million, so I'd guess real world performance is equivalent to a decent spec quad core, very roughly. So while it's a laudable and very valuable piece of research, I doubt it's the rocket they're implying.
@Louis Savain: back again, troll?
So when the Sun has finally set who will come up with the brilliant ideas?
This is just an extension of the Sun Niagra type chip - small cores, doing more parallel streams of work - which is what 80% of all computerised task is anyway, or you can buy a honking big IBM or Sparc overengineered multi core chip with a limited future applicability. It was so successfull it kille Sun's Rock ...
Louis - you are partialy right - the software needs to be clever enough to use the chip capabilities. However - I do not think the M$-trained programmers can do that anymore.
I keep on wondering - when Sun has finally gone, where will the new ideas come from?
Surely you mean...
"and the path the company is treading is many-cored."
Surely you mean "magny-cours"...
with design heading towards more and more cores, this is surely going to lead to the development of complex nural networks, and with it a whole new world of programming with the ability to behave and respond much more like a human would. We got the physics card, how long till we get an add on AI card with 4 of these babies on board.
Is this where the money went?
"..the compute-intensive Black-Scholes financial modeling app,.."
This is just a renaming of the 'Black-Hole' financial management system that banks have been operating for the past few years. If it gets automated, we can go into recession faster and more effectively.
This won't do AI
The cognitive & AI allusions are nonsense.
Or else 100 interconnected Z80s or Transputers or Pentiums would have AI.
There is no software/functional design spec yet for what is a real AI or Cognitive computer.
Maybe it's not extremely related to the article
But the CPU-GPU mashup, i'd call it SarrahBee instead
Seriously, anybody have an idea as to how much it would cost a company such as Intel to prototype a new chip in this way? Apart from the raw material costs and design costs, is it expensive these days to produce a single wafer of a custom design or is it something that is now relatively cheap?
Just curious really...
@Nick Ryan - how much?
It depends now much of the design is just replication of old fully-debugged stuff. If this is a bunch of P3-Celeron cores connected together with logic that's already well-tried on multi-core CPUs, it may be quite easy - little more than a matter of "joining the dots" with an interconnect layer. The more novel logic is needed on the chip, the more expensive it gets to develop.
The thing I'm wondering is why they don't or can't integrate some RAM alongside each core, because bandwidth between cores and RAM will be a bottleneck. Anyone know? Maybe the appropriate silicon process for RAM is incompatible with the process for CPU cores?
any resemblance to any existing processor is purely coincidental
40 cores on single die. i know where i've seen it before: S40 from IntellaSys (http://colorforth.com/S40.htm) and coming GA40 (http://colorforth.com/GA.htm).
granted, they're not IA-compatible (probably a Good Thing), so far are programmed in forth only and draw all of 9mW per core at full speed thanks to being vastly simpler than anything intel does. luckily they aim for different niche.
not TOO much resemblence, really....
and their core interconnect it vastly more elegent (infact it also contributes to the low power consumption).
and GreenArrays makes a 144 core part,
and all of GA's designes to not require as small (read expensive) of fabrication techniques.
These will likely not ever run anything but forth though, due to the nature of their design (but I want a dev board anyway)
Not going to be a commercial product?
"I only had to permute the options"
What ever happened to good old-fashioned plain English? Why couldn't he just have said "fanny around with"?
"Plain" not always best.
In this case, "plain English" is simultaneously more verbose and less descriptive. If it worked best for everything, then maybe engineers would already use it accordingly.
"My guess is that performance can be estimated from the power output and separately, number of transistors. 125 watts corresponds to a decent modern quad-core IIRC, and the number of trannies in my dual core is ~300 million, so I'd guess real world performance is equivalent to a decent spec quad core, very roughly. So while it's a laudable and very valuable piece of research, I doubt it's the rocket they're implying."
"Guessing" that throughput can be estimated from power output is exactly that - guessing. Firstly, power consumption is distinctly non-linear with increasing clock speed. As Intel found out, ramping up clock speeds to ever higher levels gets diminishing returns and a catastrophic reduction in energy efficiency. It's well known, that if you back off the clock speed perhaps 20% you can double the number of cores for the same power consumption. Single thread speed isn't as good, but total throughput (with the right software) is much higher. That's without the fact that Intel have been making great strides on power efficiency in other areas beside just dropping clock speeds. Throughput per watt has increased enormously over the last few years, and as a new chip, then I'd expect this one to have gained some benefit.
Now there are big issues with the scaling of a 48 core chip, especially if it supports single consistent memory models. There are huge issues of memory bandwidth and maintaining cache coherence (given that the article alludes to cloud computing in a chip, then it's very likely that it is not anticipated that it will be used with as single consistent memory model across all the cores).
Going back to the article, the impression given is that, somehow, truly intelligent machines will emerge through ever higher throughput processors. I think that's a nonsense - I've seen 25 years of work in the AI field, and precious little has come of it in terms of "true" intelligence. We have expert systems, writing, speech recognition and language translations and so on, but none of those come remotely close to what a human being can do, or the way that the brain works. It's going to take a fundamental breakthrough in the area of cognitive science before that is cracked (and I see little sign of it) and I don't think it is going to come just through faster CPUs owing their fundamental architecture to Von Neumann founded on logic.
@Steven Jones, @Ken Hagan
> "Guessing" that throughput can be estimated from power output is exactly that - guessing.
Yes, and moreso without info on cache, clock, architectural details etc. It was a guess. I was trying to analyse their implication (not claim!) of how speedy it is. I'm aware of clock/power tradeoff (though not the precise tradeoff; need to get hennesey & patterson - the 20%/double-the-cores ratio is higher than I'd expected)
> Throughput per watt has increased enormously over the last few years, and as a new chip, then I'd expect this one to have gained some benefit.
pretty much the same benefits as an equivalent process, hence my comparison with a modern quad-core - I *assumed* they were the same process. They didn't provide info on it and I'm aware it could have been an older process. I still suspect that the wattage/heat is the most reliable indicator of speed if I can interpret it properly...
> especially if it supports single consistent memory models
by my reading of the article ("The cores communicate by means of a software-configurable message-passing scheme using 384KB of on-die shared memory."), it doesn't. More load on the programmer, less on the hardware, and good thing too IMO. We need to go down that road. Also see my comment on microsecond comms overheads (I'll save you the effort: "ouch")
> (the AI crap) - we're agreed.
> If you are willing to accept (say) half the per-core performance, the per-core power consumption drops by an order of magnitude.
This confirms how little I know about this.
> And the programming model...
I'm not disputing that, just performance
> finally proves Sun were on the right track with Niagara.
I'm damn sure that general bulk-SMT architecture ***was*** right for the majority of computing. It seemed obvious to me that we were going down a wrong alley just by the incredible complexity of modern cores.
But I was purely trying to inject some reality into intel's marketing.
Cloud this, cloud that
WTF is it with relabeling things that have existed in various forms for years with the word 'cloud' in it?
So it's a chip. With some processors on it.
I'm going to wrap some Cat5 around a jam-jar and call it a cloud compatible wired networking device. Should be a able to sell it to a few gullible managers!
Re: actual computing speed
"clock speed is also unspecified, probably for good reason"
I noticed that too, but bear in mind that single-core performance is currently achieved by tweaking absolutely everything as far as it will go, to the extent that the limit is set by heat dissipation. If you are willing to accept (say) half the per-core performance, the per-core power consumption drops by an order of magnitude.
And the programming model is going to be easier than GPGPUs, and reputable folks are already using the latter. According to the presentation, Intel already have Linux running on this chip. Can any of the GPU vendors make the same claim?
This chip could well be the commercial compromise that finally proves Sun were on the right track with Niagara.
Major Step Forward
Well done Mike Ryan (a software engineer from Intel Research Pittsburgh) - you invented a new word 'permute'. I guess it means travelling to and from work only split amongst 48 virtual vehicles... or something else...
We seem to have side-stepped the issue of multi-core programming by the recent phenomenal rise in virtualisation. The future will huge datacentres run by the likes of Amazon. They're inclined to give you stuff cheap anyway, so if they get their hands on 48 core processors we'll get our cpu rental for pocket money.
The will see, and they will hear, they will probablly speak, and do a number of other things that resemble human-like capabilities.
One word: storage.
So this processor will be small and able to pretend to be a human, but will be shaped like a 40-foot lorry!
... to sci-fi classic "Who Can Replace A Man?".
Congratulations on breaking the first rule of posting statements on the Internet. That rule is to always check your facts. Permute is not a new word - it's a verb meaning to change order or subject to permutation.
Totally agree with you, although I have actually patented that very idea you think you had first with the jam jar and Cat5 cable. I have already marketed it under the name "Exospheric Networking Device".
The Exospheric being the top notch, and price is related to that, the second model which is two yoghurt pots joined by a piece of string of length according to need, and named the Stratospheric is priced accordingly.
Only FTSE 100 companies can really afford these models, so I'm a bit loathe to provide full technical details here.
@ac performance of cores
"More cores is good, BUT you have to increase the performance of cores vs. current cores"
So seem to be saying that the performance of the individual cores in a multi-processor design is less than the performance of a single core.
There's no real reason why this should be the case. If you're talking about a multi-core from one technology and comparing it to another technology ( different feature size, different architecture, different vendor) then yes, peformance will be different, you're not comparing like with like. But that's a statement which is true for single processor chips too.
But if the core in a single chip is integrated with others to create a multi-core solution on a single die, then there should not be a performance degredation.
There might be an issue with heat dissipation with mutiple cores which could lead to a reduction in the clock frequency, but that's tough, you have to live with that.
I'm not really sure what you're getting at.
To say, contrast a muti-core Cell processor with a single or multi-core intel processor would be an invalid and somewhat silly comparison.
Even on a multi-core design, the engineers are going to push the technology, run the thing as fast as possible, why wouldn't they? (unless you're specifically going for a lower power design to avoid using forced air cooling).
Performance of Single Core Vs. Multicore
The idea that a single processor is faster than the individiual cores in a multi-core chip isn't really going to be an issue, it shouldn't be in most applications.
A single processor is blindingly fast these days, compared to days of old.Most applications don't require this level of performance, except games.
Step into the arena a multi-core chip, where each core is lower in clock frequency, where the architecture of each core in the multi-core chip is the same architecture as the single core chip. Many applications can naturally be structured to use multiple processes running concurrently, and so can be implemented on multi-cores.
Sure, existing applications will need to be re-designed perhaps, but new applications can be partitioned in terms of their functionality to benefit from multi-cores, with the net result being they will run faster than if if they were implemented using a higher clock frequency single core.
So, I don't hold weight in the argument that single cores are faster. They are only faster because software is currently developed to be fastest on them. Applications developed for multi-cores will be faster than the single core solution.
The Intel Core i7-975 Extreme Edition has 731 million transistors and 4 cores, which boils down to 183 million transistors per core.
This baby has 1.3 billion transistors and 48 cores, which means a hairs' more than 27 million per core. That kind of count throws us back to AMD K7 territory (1999).
Just a thought.
Cache missing ...
I believe we're forgetting how many of those millions of transistors are used for L1 cache in the top-of-the-line quad core CPUs. For example, 8 MiB of cache using 6-transistor 1-bit static RAM cells comes in at 402+ million transistors, just for the cells, never mind surrounding decoders, buffers, TLBs, etc.
Human or dancer?
"The machines we build will be capable of understanding the world around them much as we do as humans."
Any computer that doesn't feel satisfaction after having a good shit isn't much like a human.
Re: Quick comparison
"[...] a hairs' more than 27 million per core. That kind of count throws us back to AMD K7 territory (1999)."
Yeah, so? Over the same period, the clock speed of the chip has increased by about 20-fold and the supporting memory chips having perhaps doubled. That would have given a 10-fold *drop* in the number of instructions dispatched per clock, had we not spent all those transistors in various ways to keep useful work being done in the absence of hard data. This new chip probably benefits from *most* of the increase in basic switching speed, but takes a very different approach to keeping useful work being done. It's a proven approach (especially for server or gaming workloads, but hardly restricted to those) and almost certainly more scalable.
Re: Are you sure the xbox 360 is a Power box?
Yes. Xbox 360 has a custom IBM PowerPC 3 core, dual thread, CPU design (something not wiki: http://hardware.teamxbox.com/articles/xbox/1144/The-Xbox-360-System-Specifications/p1).
Xbox 1 (the original) had some P3 / celeron half-breed thing from Intel (http://hankfiles.pcvsconsole.com/answer.php?file=135).
Is this the precursor to skynet?????
More likely is is being designed to be able to cope with the up & coming m$ bloatware ( next gen ).
I can't beleive we got this far down the page before the similarities of that picture to the broken CPU from the Terminator classic were raised.
Looks like nothing less than the dawn of total mechanical domination to me.
Sun SPARC 64bit T2 with 64 Threads Today or Intel IA-32 with 48 Cores tomorrow...
'The SCC's 48 IA-32 cores were described by Rattner as "Pentium-class cores that are simple, in-order designs and not sophisticated out-of-order processors you see in the production-processor families - more on the order of an Atom-like core design as opposed to a Nehalem-class design."'
The in-order design means the IA-32 cores will sit idle more time then they are working. Sun attacked that issue with Niagra by having 4 and then 8 threads per core - giving the CoolThreads T2 processor 64 threads of fully compliant SPARC instructions.
I am curious to see what the throughput performance difference is between a 64 thread T2 and 48 core Intel SCC.
Will people be interested in lots of 32 bit cores when the world has been moving to 64 bit for a decade?
I don't see the benefit...