IT shops buy current products, but they always have their eyes out one or two generations to assure themselves they aren't buying into a dead-end product. Which is why makers of chips and other components that go into systems as well as system makers themselves are forced to talk about the future when what they really want to do …
around to the cell processor design. Keep up boys, keep up.
And moving into ......
...... cell processor cells design, AC, ....... for that added edge that guarantees lucrative mutually shared advantage in future[s] markets. .......... with what is in essence, a virtually dynamic SMARTer chip which grows with/in use.?!
Which really does/would require Intel to up their gameplay to remain around the table with its upgraded stakes and posting antes. And that is always a personnel issue/head matter in every business.
I'm not so sure. The cell processors don't feature a shared ID (nor schedulers).
I completely agree with the sentiment that shorter pipelines are always better for non-hpc tasks, contrary to intuition perhaps.
Actually, the opposite of Cell
Cell has one general purpose core, and many special purpose coprocessor elements. Bulldozer has many general purpose cores, and two FP coprocessor elements. If AMD put GPUs in the mix with a dual-core Bulldozer module, that would be more like Cell.
The Bulldozer core structure is a lot like UltraSPARC T2, which has two integer pipelines and one FPU per core, and the aborted Rock processor, which had four integer pipelines and one FPU per core.
Actually, by using existing Opteron integer pipelines, and putting four of them into a core and sharing the L1 cache, Bulldozer is philosophically very similar to the UltraSPARC T2, which basically replicated two existing UltraSPARC T1 pipelines in the core with a shared L1 cache.
Great article, finally gives us all some good insight into what Bulldozer really is!
special HPC version
Generally a great article however I'm not convinced by the 'Special HPC version'
If there was only a single Int unit taken out in a 16 core (8 module) chip, wouldn't it be more economical to release this as a 15 core chip? We've already seen AMD harvest to a single core in the X3 Phenom line.
'Doze that laptop
Great technical article on Bulldozer. Took me right back to my processor design coursework and the Transputer, oh heady days that they were...
But seriously how long until I can have a laptop with the 16 core HE Opteron? Better still, will this prompt Apple to look at AMD processors? Then I could have a Snow Lepoard with 16 hearts (although by then we'll probably be on Clouded Leopard, Marbled Cat or some other Felidae species)
Clouded Leopard... How... Prophetic.
The title is required, and must contain letters and/or digits.
My Quad core PC still hangs for a few seconds while Windows decides it needs to refresh something or other.
I'm sure another 12 cores will make no difference.
Ditto startup - that always seems to wait, rather than really running stuff in parallel.
These delays are most likely due to i/o latency, specifically network or hard disk, so throwing more CPU at the problem will not help. Windows having a more streamlined (or multi-threaded) i/o model would help. Or you could just run SSDs.
They ought to have multiple threads per core; not two, like Intel's HyperThreading, but eight or more. Because if you don't have a long pipeline, that means you're not cutting the instructions into small enough pieces.
Without having to devise faster transistors, just keeping more of them busy, one could achieve a clock frequency that is four times higher by cutting the instructions into four times as many parts. Unless, of course, they took that as far as it could go back in the days of the Pentium 4.
Seriously. Why have you made this top story?
Not an expert but it's too easy an assumption that one can increase the number of instruction stages. Pipelines can already be quite long and each stage would have to take about the same time to execute or it would stall but for example the execution can be very variable, from a simple shift to an integer sqrt (some PPCs had this I think), and a mem fetch can be getting on two orders of magnitude difference depending whether it hits L0 cache or misses everything and reads ram. Also longer pipelines = longer bubbles. I don't think AMD techs are would miss something so obvious.