The Hot Chips 23 symposium on high-performance chips kicks off at Stanford University next week. The makers of processors for smartphones, desktops, servers, and networking gear are polishing up their powerpoints to amaze and daze each other from August 17 through 19. On the traditional server front, Intel is on deck to talk …
Not sure how much sense this sun virtual thread stuff will make
the point about hardware threading is mainly (as I understand it) to hide the many latencies, most especially memory latencies. If you cycle through (say) four hardware threads then you have a reasonable chance that an outstanding delay will have been resolved by the time you get back to the first thread (external memory accesses probably being an exception).
If you simply turn off all but one thread, it will sit and do much thumb-twiddling at assorted traffic lights while the others don't get anything done either (because you disabled them ), so I wonder how much of a boost this might be. Possibly much smaller than the marketing drones would like to present it.
Also... priority inversion, anyone?
(I am not a CPU designer, any who are may correct me).
I expect that critical threads will only be used for highly optimised code that does minimal memory access and can be accommodated in the first level cache.
I'm sure they created this feature after carefully analysing real world applications, it wouldn't have been dreamt up by the marketing department. In this era of slowing CPU progress, optimisations like this could offer considerable performance & competitive advantage (or help them catch up, as the case may be).
Yes, that is correct. With many threads AND the ability to switch thread in one clock cycle - you can efficiently hide latencies. Normal cpus switch threads in 100s of clock cycles, which means you can not mask latencies.
For instance, studies by Intel shows that a normal server x86 cpu, idles 50-60% of the time - under max load. Under full load - a typical x86 cpu waits for data 50-60% of the time. This means a x86 cpu running at 3GHz, is actually doing work corresponding to a 1.5GHz cpu.
That is the reason normal cpus have big caches, complex prefetch logic, etc - to try to minimize latencies. CPUs have reach high GHz, but RAM is still slow. Thus, if you have a 5GHz cpu and the RAM is 1GHz - then the CPU needs to wait for RAM all the time. But if both the cpu and RAM runs at 1GHz, then cpu need not to wait. Thus, high clocked cpus are not really meaningul. 5GHz POWER6 cpus using 1GHz RAM is really pointless. Even IBM seems to understand this now, as IBM has decreased clock speed and increased the nr of cores.
So, how successful is the Niagara approach? Well, the Niagara idles 5-10% under full load - waiting for data. That is much better than 50-60%. Thus, the Niagara at 1.6GHz competes with, and outperforms in some cases, much higher clocked cpus. In fact, Niagara holds several world records today, beating mich higher clocked x86 and POWER7 cpus.
The funny thing is that Niagara has a tiny cache, because it hides latencies very well. Thus, Niagara is fastest in the world in some cases, without big caches. What does that prove? It proves that Niagara is not cache starved. If it were cache starved, it would never beat 5 GHz cpus. You need 14 (fourteen) POWER6 at 5GHz to match four (4) Niagara T2+ cpus at 1.6GHz in official SIEBEL v8 benchmarks. How is that possible if the Niagara is cache starved?
Conclusion, the ability to hide latencies can be very valuable.
Crysis Lan Party...