Why the name
After John Poulson (http://en.wikipedia.org/wiki/John_Poulson), a corrupt English 1970s property developer?
Everyone else might have pretty much abandoned the Itanium processor, but Intel and Hewlett-Packard – who co-designed the 64-bit processor – remain firmly committed. That's mainly because HP has a captive audience of HP-UX, NonStop, and OpenVMS customers that spend billions of dollars a year on systems and therefore make it …
After John Poulson (http://en.wikipedia.org/wiki/John_Poulson), a corrupt English 1970s property developer?
named after Pat Poulson, the comedian. Itanic's always been a joke.
I though Intel used place names for chip codenames, as you can't trademark a place name. So I asked Google Maps, and it says there's a place called "Poulson, Accomack, Virginia 23409, United States".
Link to Google Maps: http://goo.gl/4aBSq
There are some issues with it though. It will still not be competitive with Power7 and by 2012 expect Power7+ to be available with Power8 knocking on the door in 2013. The chip speed will be 2.4GHz which is a decent upgrade from the 1.73GHz.
The problem with socket compatibility is HP is stuck with still only having 5 QPI links. The 8 socket blade does not have the SX3000 chipset so the chip hopping is a major problem. The Superdome2 only uses 3 of the QPI links and one is for redundancy. The current chips are starved for bandwidth and putting twice the cores / chip with 50% faster QPI's makes the architecture worse.
Any word on hardware virtualization? HP's Integrity Virtual machine is a dog.
looks like this chip might actually deserve the 1 license / core that Oracle has punished Tukwila with.
".....It will still not be competitive with Power7...." Why? Simply becuase you say so? Please, indulge us with some of the technical and analytical wizardry that led you to that conclusion, otherwise I'll have to conclude it was just the inner Troll talking.
"......The problem with socket compatibility is HP is stuck with still only having 5 QPI links....The current chips are starved for bandwidth......" Que? Sorry, but I'm enjoying lots of bandwidth niceness with my Tukzilla BL870c i2s, I'm getting about four times the application performance compared to an old Superdome partition (I suspect the bottleneck is the SAN, not the blades). <Sniff, sniff> I smell Troll manure....
".....HP's Integrity Virtual machine is a dog...." Again, no evidence to back up that assertion, would you care to explain what areas you think IVM is lacking in? Strange that's your also working through the IBM FUD list. We had the IBM Elmers in not too long ago (worringly for hp, their "predictions" for Poulsen hardware were spot on, there seems to be a bit of a leak somewhere!). They sang their new FUD song as follows - "Power is better 'cos we say so, QPI has "limitations", IVM is poo, have you heard about the new Oracle pricing?" TBH, it was hard not to fall asleep, I was hoping they'd come up with a new act for Poulsen rather than a vague rehash of the Tukwila FUD. Maybe you should help them along, you all seem to need new material.
Interesting reply and the FUD seems to be working as your reply was lame.
We don't have any Itanium left so I can not add to the conversation.
Good to see you still hold the torch.
Montecito cores from 2008, so not that old. I reckon we could replace a Montecito SD partition with a BL860c i2 quite easily, but we're "factoring for growth" (AKA, making the salesman happy!). And the truth is I think the Elmers are getting a bit boring nowadays, they just don't seem to put the effort in they used to. Whilst they all talked a load of cobblers, some of them were at least entertaining with it.
The introduction of soft error recovery in the FP units and caches is going to sound very tasty to the supercomputer-builders.
..and looks like it'll be good enough to keep HP in the UNIX game.
But how it will stack up against a POWER7+ is to be seen. But I have no doubt that it'll smack an Oracle 2012 SPARC offering around.
The move to using shared caches is another significant change. Previous versions have had private caches per core.
Surprised there is no TLB marked on the chip layout.
I will confess to being pretty impressed. For years we kept hearing about how the really good design elements were being delayed for IA64. But it seems Intel did keep the faith as it was, and what looks like a range of significant design streams have all come in at once.
Seriously, if an x86 chip was announced with this amount of progress from Intel we would be utterly gobsmacked, and AMD would crawl away to die. But it is the IA64, and sadly no one cares. At least for now.
What people tend to miss is that the expected topping out of the value of the x86 architecture that prompted the initial work on IA64 as a replacement has pretty much come as expected. However the question that may be more important is whether anyone cares either. Nobody really cares that desktop performance is moribund. The action is elsewhere. In a decade we might live in an environment where ARM and Itanium are most important architectures, and where their use essentially doesn't overlap. Clouds and portables might be the reality. In the high performance area it would actually be nice to see Itanium get a second chance. SGI might still be smarting from the disaster that came with their first foray with it, but a next generation UV with these would be something to see. The Altix 4700 series were a very nice machine. A 12 wide instruction path should allow for some stunning speed on many numerical codes. Couple that with a ccNUMA machine with a few thousand cores and you get some serious grunt.
...it's what they haven't.
The improvements in core power usage look good, but they will have to be backed up by clock speed increases in the released products to gain ground on Power or continue to provide a reason for keeping Itaniums when a x86 may provide a more cost effective solution.
The cache and QPI increases will allow existing Itanium shops to continue to grow but they won't provide the performance to grow the (already shrinking) Itanium customer base.
Should I hazard a guess that these Itaniums will also be delivered after the 22nm Westmere-EX refresh as well?
/coat, as I'm already late to this party.
tick tick tick tick tick
"a reason for keeping Itaniums when a x86 may provide a more cost effective solution."
The reason people buy Itanium boxes is largely because they need the functionality of HP-owned (but probably not HP-developed) software that simply *isn't available* on x86-64. Any performance and scalability differences are increasingly irrelevant now both systems use HyperTransport  and the x86-64 memory capacity increases way beyond the land of sensibility for most peoples' apps. It'd be interesting to see the real detail on the RAS differences too, because ever since IA64 was introduced and this particular hand-waving commenced, they've mostly been insubstantial (or worse).
Hardware cost-effectiveness comparison is way down the line when the OS you need is Tandem NSK or DEC VMS or maybe even if you have a substantial dependence on HP UX. If NSK or VMS was ported to a decent x86-64 server range, IA64 would vanish in weeks; modern NSK and VMS have few hardware dependencies and have already been ported across more than one chip transition.
 yes it's the Intel variant thereof, I just like to remind folks.
Admit it folks. Intel serves incredible coolaid. Yum Yum. The slides look great. The technology appears awesome and it finally sounds like Itanium is going to catch up. But wait a second! Lets put on our reality caps here folks. When has Intel delivered Itanium on time and as planned? Show me a single Itanium CPU that has come out in the expected time frame with the expected technologies!? Every single Itanium, from Merced to latest Tukwila has slipped by atleast 2 years (Tukwila slipped 5 years if you look at here http://www.theinquirer.net/inquirer/ news/483/1038483/intel-tukwila- suffers-from-political-squabbling) and you believe Poulson, with a major technology shift, core changes, etc will be delivered next year in 2012? Has Tukwila really started shipping across the HP Integrity product line yet? Last time I checked, even Superdome2-32 still isnt out beyond science projects installations and Superdome2-64 is MIA. In a years time, Xeon will be at 12 to 16-cores, with 3-5x more performance, running leading OSes like Linux, Solaris, etc and with Itanium now stuck in the HP-UX rut (Linux and Windows has been abandoned), who the hell is going to invest in this architecture besides those with handcuffs to HP? I can't see how Intel continues investing in this science project, especially when its HP making any of the revenue on systems. As an Intel shareholder, I am definitely disappointed in Intel throwing away money on Itanium. Its time to just bury the thing.
Intel makes very large margins on these chips. HP makes very solid margins on the Itanium based systems. Customers use these systems for various reasons, but mostly because they do something that 1) is important and can't be done on cheaper boxes or 2) can't easily or inexpensively be moved to something cheaper.
Sure, Intel could dump it, but why? They're making $$ on it, as is HP. Will they be on time? History would say no, but supposedly THIS is the time when they're going to hit their schedule.
Maybe you could circulate a petition among your fellow shareholders asking them to drop Itanium development?
Intel is arguing that applications and operational system won't need to be recompiled... That's true, instructions remains the same, but how about the improved 12-wide instruction pipeline from the 6-wide one?
Will the application be able to submit 2*6-wide instructions to the same core without any modification?? From what I understand EPIC architecture is mostly based on the fact that during compilation the code is optimized in order to submit parallel instructions, so I guess one application which was compiled based on 6-wide words will have about half of the performance compared to the same application recompiled to use 12-wide words!!
If my conclusions are right we will see again the principal weakness that affects EPIC architecture: binary compatibility. Customers will need to adopt the latest version of HP-UX and wait for ISVs like Oracle to certify its applications, which also will need to be the latest Oracle DB for example. We all know that the market has a considerable delay on certifying and adopting the latest version of anything... and for Mission Critical applications, the case for most Itanium servers, clients are even more conservative.
"Will the application be able to submit 2*6-wide instructions to the same core without any modification??"
Assuming the application runs more than one thread, it should still do pretty well.
BTW, what are the odds that this focus on low power consumption / heat emission is to head off ARM's entry into the server market?
If you're going to take the pain of switching architecture due to those points, you might as well switch to EPIC to get the performance boost as well.
"BTW, what are the odds that this focus on low power consumption / heat emission is to head off ARM's entry into the server market?"
I'm not sure there is a lot of sense in trying to scale a 130+ Watt process (Itanium) back to compete with ARM (<2 Watts)
The power usage optimisations mentioned in the article resemble those already included in 2-3 year old Core iX x86 chips. Some of the optimisations are necessary just to getting a working chip at 32nm, some such as power gating allow lower power usage at idle or when workload is limited to some of the cores.
"Probably less than a 20 per cent boost, since the relationship between clock speed and heat is logarithmic, not linear."
I think that's wrong. The equation for heat (power) produced by a circuit is:
P = C * V^2 * F
P = heat produced
C = capacitance of a circuit
V = voltage
F = frequency
So heat increases exponentially with voltage but LINEARLY with frequency, not logarithmically as the article says.
What have threads got to do with it?
The point about EPIC and parallelism is that the parallelism has to be discoverable by the compiler. Successful modern processors don't do that, they let the compiler do the best it can and let the silicon do the rest at run time. Apleszko has the right idea; wider issue is of no benefit unless you recompile.
As for Itanic being a potential performance-per-watt competitor for ARM: er, I don't think so. Whereas ARM as a processor for low wattage moderate-performance server (and desktop) boxes is not just possible, if it wasn't for Intel strongarm techniques (hello Dell), it would be happening right now.
Well, I guess I was taking it too literally, that it won't be -necessary- to recompile, and assuming that the wider pipeline can be loaded with instructions for another thread - but now I think back about it, it's the execution units that get shared and each thread would run in a separate pipeline.
Yeah, ok, that's a second 'brain-fart'... at least my coding today has been more successful...
The compiler, true, does some guesswork at potential parallelism, but the programmer has to write his program threaded to begin with to take full advantage of parallel processing. The nice thing about threads, I can run 5 threads in one program on a single-core computer and get normal performance (minus overhead). But, I can run the same 5-thread program on a quad core with HT and the OS (key word) will assign each thread (hopefully) to the various cores available to it, essentially allowing each thread to run in parallel as opposed to time-slicing on a single core. This is why a recompile isn't necessary: the previous Itanium chips were multi-core already, thus the compiler already optimized for multi-core. More cores simple provide more places for the OS to assign threads. The program itself should (SHOULD!) be intelligent enough to determine the max number of true threads it can take advantage of.
I do agree with the concept of ARM in a server environment though. Likely would have happened already if there wasn't roadblocks. However, it will likely take a bit of engineering to stick ARM cores in a worthy environment for HPC. Perhaps they'll adopt HyperTransport for inter-core/CPU communications?
The reason why PC continues to rule the roost is that all that ugly legacy (and various improvements on it like ACPI, EFI, etc) allow a general purpose OS to install, boot and run.
The reason why Sparc, Itanic, PPC, etc are around is that they have similar legacy in the form of a good firmware loader and a well defined and artificially limited architecture. This allows a general purpose OS to install, boot and run.
The reason why ARM is nowhere near that is because there is no legacy and no architecture limitations either. Every licensee has done their own wild thing and unless your kernel has been compiled for the correct hardware and you know the magic incantations to describe IRQ routing, DMA, framebuffer offset, etc you are not getting anywhere. That does not make for a good server platform. Sysadmin staff, software developers, packages, tools, etc all have to be trained to a specific variety and so on which defeats most of the advantages of having better performance per Watt and per cubic cm of datacenter space.
I'm note sure why ARM is being considered as a competitor at all. Basically there are two approaches to throughput - adding many more, but slower threads. That's what the SPARC T series does. The other approach is to have fewer, but faster threads. That's what Power, SPARC64 and Itanium are all about (albeit all have a second hardware thread per core to eke out a bit more throughput, although sometimes that's counter-productive and has to be turned off). The latter are very good at running high throughput databases, optimising response times and so on but are premium priced and use disproportionate amounts of electricity. However, for some sorts of workloads then they are the only feasible approach.
Any ARM based server is going to be in the same market segment as the SUN T series. That is lots of power efficient hardware threads, albeit implemented as one thread per core (unlike the 8 of the T series). ARM can do that as the cores are very compact and power efficient, but it is never going to give startingly good single thread performance (but it ought to beat the T series).
Note that the Intel x86/x64 is closer to the Power/Itanium/SPARC64 characteristics than ARM or T series.
A lot of Intel's fab plants used to have VMS systems for their factory automation. If this is still the case then Intel's reason for producing new Itanium chips may be simple self intrest as moving automation systems from one system architecture to another can involve a lot of VERY expensive outages.
"A lot of Intel's fab plants used to have VMS systems....." A little birdie tells me that AMD also have hp Superdomes running their fabs! If it's true then that's real irony!
"the compiler already optimized for multi-core."
No it didn't. It can't. The compiler can only optimise at the instruction scheduling level for a single core (which is what EPIC is about) for parallelisation that can be spotted AT COMPILE TIME. Multiple independent threads (even several copies of the same thread) running across multiple cores cannot be guaranteed AT COMPILE TIME to be executing in parallel to the extent that the parallelism can be exploited in an EPIC compiler on an EPIC architecture, even if it made sense (which it doesn't - it's twelve instructions per cycle **on the same core**).
"Every licensee has done their own wild thing and unless your kernel has been compiled for the correct hardware and you know the magic incantations to describe IRQ routing, DMA, framebuffer offset, etc you are not getting anywhere. "
Very true, though arguably it hasn't really held Linux/ARM back much to date. Fortunately for ARM and their potential customers, Windows 8 on ARM will presumably mandate some level of hardware consistency, and as long as it's a sensible standard (?) it will eliminate much of this problem. That will make life so much simpler, especially for Linux on ARM, where most of the code already exists and has been shipping for years, despite these irritating little differences getting in the way. Windows has some catching up to do. See also tried tested proven stuff like redboot etc, for platform independent architecture independent boot support (no idea what Windows on ARM will mandate, but once they do...).
Intel should dump it because if they were to invest the same money in their AMD64 clone instead, they'd get a better RoI than with IA64.
HP should dump it because if they were to invest the same money in printer ink or Proliant Servers instead, they'd get a better RoI than with Integrity.
"Intel should dump it because if they were to invest the same money in their AMD64 clone instead...." And since when have Intel been short of a quid or two? I don't hear any funding issues regarding Intel teams working on Xeon, and cross-stream developments like the QPI work taken from the Itanium stream seems to have boosted Xeon as well, so I'm confused as to why you think Intel would gain by killing Itanium. Intel wants a chunk of the installed SPARC-Slowaris base left high-and-dry by the Sunset, and developing Itanium and Xeon gives them two bites of the apple and at the same time keeps the pressure on IBM's Power.
".....HP should dump it because if they were to invest the same money in printer ink or Proliant Servers instead...." Again, where do you see any shortage of investment? In fact, hp has climbed to being the number one IT company by pursuing a diverse portfolio, not just the most profitable. That diversity allows hp to shunt profits and cross-sell amongst different lines, meaning hp can usually take at least one deal from the table even if the prime deals have gone. Having a narrow portfolio is what killed Sun when that portfolio lost its value. Just think for a second what might have been if Sun had bought Lexmark and NetApp when they were cheap and Sun was flushed with cash, they would have had a fallback for when their server range was in trouble.
"the programmer has to write his program threaded to begin with to take full advantage of parallel processing. " (and the rest).
You're missing the difference between instruction-level parallelism (which is what EPIC works with) and thread-level parallelism (which is what everybody else does).
An EPIC compiler looks at a block of code (at compile time) and attempts to find instruction level parallelism within that block. Anything not related to that block (such as other parts of the same program, other threads, whatever) is not a candidate for parallelisation at that stage.
A modern non-EPIC multi-core or SMT processor with a multi-threaded app can detect parallelisation opportunities AT RUN TIME that an EPIC processor cannot see and cannot exploit.
Not quite true that competitor processors don't do instruction level parallism. All the top-end processors like Power are super-scalar and have some degree of support for run-time instriction level parallelism. However, it takes a lot of silicon to analyse instruction dependencies on the fly and perform parallel and out-of-order processing. Those super-scalar processors are what might be called IPIC where the parallism is implicit rather than EPIC where the compiler generates code with instruction parallelism explicitly flagged. However, even on traditional processors clever compilers could optimise for super-scalar processors.
Also EPIC processors can support multi-threaded apps too. A modern Itanium processor has two threads per core (like its direct competitors) and the trhead counts can go quite high if you are willing to pay the price.
At the moment it looks like HP got the EPIC bet wrong in that there isn't any great advantage to be had. However, support for Itanium and HP-UX is all about competing with Oracle/SUN and IBM on the top-end market and exploiting legacy.
AC 01:27 here
"Not quite true that competitor processors don't do instruction level parallism. " (and the rest)
All very fair comment, I was trying to be briefer than usual, and at this level of detail, it doesn't always work (especially at that time of day).
For a historic view on the difference between instruction level parallelism and thread level parallelism, and why superscalar beats EPIC, have a look at