Intel's 45nm 'Nehalem' processor design will incorporate the second generation of the chip maker's SSE 4 technology. For now, the company is calling the post-'Penryn' Streaming SIMD Extensions instruction set SSE 4.2. Nehalem's implementation of SSE 4 essentially matches that of Penryn. The key additions centre on the …
Logic behind HT
I'm not sure I get that one. Hyperthreading was very useful prior to the appearance of multi-core CPUs, yet it scaled pretty badly. It added performance to 1 or 2 CPU machines, however you wouldn't see any gain in performance with 4 CPUs.
I just wonder how that's going to work - and actually improve performance with Quad Core CPUs.
Re: Logic behind HT
HT allows two threads to utilize different areas of the execution core. Hypothetically, one could have a thread running an integer operation, while another runs a floating point operation simultaneously. You are right, that there is less to gain from HT in a multi-core CPU, but I suppose it's like Intel says "on getting the job finished more quickly, the better to improve power efficiency." To me HT isn't that important, and it doesn't bother me one bit that my Core 2 Duo doesn't have it, but I guess it doesn't hurt to have it either.
Re :Re: Logic behind HT
HT works as you suggest however there is more to it than you suggest.
Each core up till now had 3 seperate instruction pipelines. On a single thread the instructions for the thread would be analysed and then split so that independent ones (that don't rely on the previous result) would be run on different execution cores. I believe this also happened with branches in some cases where both sides of the branch would be evaluated whilst it was waiting for the result of the branch test to come back.
On net burst architecture because the execution pipeline was very long - so to get branch results back took a long time. Also splitting up the instructions into independent ones was harder because it took so much longer to get results back, making the dependency chain longer. As a result of this quite a lot of the time one or more of the execution cores were sat idle. So HT could take advantage of this because the second thread is always independent from the first and so the instructions could be interleaved.
However the newer Core architectures have a much shorter execution pipeline so there was less idle space in the execution cores to be taken advantage of by hyper threading. Now though they're adding a fourth execution core which they must feel means there's enough spare slots on the cores to support another thread.
This also should mean for power management that if the CPU/OS detects that two threads run happily on the same core then additional cores can be shut down, saving on power usage.
Also as we write more and more core/thread hungry apps then HT will be just as useful for multi-core as it was for single core. It's also probably pretty handy in a server environment too where the cores are often IO bound and so having more hardware threads is a bigger win.
Forget Nehalem ...
Intel still haven't managed to get the previous generation into production, despite promises. Where is the Q9450, for instance? Has ANYone got one, apart from engineering samples?