* Posts by Andrew Harrison

1 publicly visible post • joined 11 Apr 2008

Sun's UltraSPARC T2+ servers ship full of Niagara Viagra

Andrew Harrison
Thumb Down

Re: 128 threads, just not at once

Actually Sun did not prove T2+'s performance with a Lotus Notes Benchmark, they in fact published leading results for SPECrate (int and fp), SAP, SPECjapps, SPECjbb and SPECompM2001 as well as the Notes results.

FYI, SAP is a Data Center application, in most big data centers you will also find large numbers of email servers and even larger numbers of web/app servers. From the numbers posted by Sun it would appear that the T2 is ideally suited as a platform to support the workloads which are currently occupying at least 70% of the servers in most data centers.

The only draw back with the old T1 architecture FP seems to have been fixed and the addition of a more comprehensive Crypto unit (it would be nice if is supported bigger block sizes for MD5 sigh) makes it very suitable as a platform to replace more specialist systems as well.

Relating to a point you made about memory throughput in an earlier anti T2 posting Sun also published a STREAMS result for the T5240 which comes in at 30GB/s interesting because the smallest Itanium based server that can better this number is the rx8640 with 4 Cells and 32 cores a 2 cell rx8640 only comes in at 2/3 of the throughput of a T5240. I don't have to remind you that a rx8640 is a 17RU server or put another way you could put 8 T5240's in the same space as a single rx8640 and stil have 1RU left.

CPU clock rates have risen rapidly, memory latency (not bandwith) has not risen at the same rate requiring vendors to configure larger and larger L1/L2/L3 caches to try to reduce the rate of cache misses in order to reduce the number of times in a given period that the whole core stalls.

The T2 does not have a large cache because it does not need a large cache, if a thread stalls this is only 1/128th of the available processing resource stalled doing nothing. On a 2 socket Itanium server a stall consumes a minimum of 1/4 of the total available CPU resources much worst hence the need for huge caches.

What you do need is a lot of memory throughput which the T2 has, more per core in fact than the rx8640. The STREAMS results for the 8640 come in at about 1.28GB/s per core with 16 cores as opposed to 1.93GB/s per core for the 16 core T2.

Digital observed back at the end of the 1990's that a typical Oracle application caused the system to spend 70% of its total available CPU cycles stalled waiting to main memory. Despite increases in L1/L2/L3 cache sizes because the relative CPU/memory latency has got worse this ratio is unlikely to have changed for the better.

The T2 is a very elegant sidestep of this issue. I would be tempted to suggest that you would be better off not knocking what you don't understand.