Re: Beginning of the end for Intel?
Hello Ken Hagan,
"Intel's original floating point model hasn't seen light of day for about a decade, having been superseded by SSE2. Both integer and floating point arithmetic models have been evolving since the mid-90s with MMX and various other TLAs. A modern desktop chip also devotes more than half its area to an integrated streaming processor that owes nothing to x86."
It's OK now, but the evolution of SSE has been a bit rubbish. It took them absolutely ages to include some fairly fundamental instructions like a fused multiply-add.
MMX / SSE was for a long time an ever changing thing and was consequently very hard to develop for. About the only way to use it was to use Intel's IPP/MKL libraries, where Intel had put in the effort to account for the different versions of SSE that your application would encounter in the field. And this costs money. To not use it meant taking on the huge job of writing versions of your software for SSE2, SSE3, SSE4, SSE4.2, etc. Unsurprisingly, very few did.
In comparison, Altivec (the equivalent to SSE on PowerPC and POWER processors) was right first time. Motorola put the right instructions into it and didn't keep changing it. So people actually wrote software to use it. For example, in the overlap between PowerPC and Intel Macs, Photoshop was far quicker on PowerPC because Adobe had actually exploited Altivec pretty well.
Itanium was slightly popular in the high performance computing world because it always had a fused multiply-add in it. I saw the addition of FMA to X64's SSE as being the signal that Intel had truly given up on Itanium; there was absolutely nothing left to recommend Itanic over x64.
"To be fair, everyone else's chips are the same. x86 lost the ISA wars against the RISC chips, but Intel responded with the ISA-less Pentium Pro and ISA hasn't mattered since then."
Almost, but importantly, not quite everyone. ARMs are ARMs, there's no microcode (at least not in the same sense as x86's). You get 48,000 transistors running the ARM op codes, and there's no real instruction translation.
It's important because of the transistor count - only 48,000. An equivalent x86 core needs several million to get the same performance (translation, pipelines, etc. etc), so it's not surprising that ARM wins on power consumption.