Matt Bryant
Oh, it is always a pain in the *** trying to explain some business people on technology. The worst thing is when they havent done the basic computer architecture courses at the University. And it doesnt help when they apply unsound logic either.
Matt, I suggest you talk to some people that knows computer architecture. I suggest you talk with people at a University, they are probably more non biased than your average HP or IBM sales rep, which you talk to. And the Uni people know a lot more. It is quite stupid by IBM to state that "one Power core is faster, ergo the Power CPU is faster". That doesnt just add up, logically. But I understand you think this sounds fair and that you have problem using logic, as you have clearly not studied logic. Otherwise you wouldnt have said these ignorant things about CPUs.
Actually, the things you say are so weird, it makes you wonder. For instance
----------------------
""Matt, stop showing your ignorance...." Anyone still pushing Sunshine after the Sunset, the massive decline in SPARC sales and the current uncertainty as to what current Sun hardware will even be available in a year's time, really hasn't got the right to accuse anyone of ignorance."
If you really believe that the best technology always wins, then you are quite naive. For instance, have you heard about VSH vs Betamax? No? You havent heard about Windows vs Unix? No? *sigh* If SUN doesnt sell Niagara boxes for one fifth of the price of a Power box, despite higher performance - what does that mean? That the SUN tech is bad, or the sales division are bad? Hmmm... Let me see, it must mean that the SUN tech is bad, right? With furious marketing you can put lipstick on a pig and outsell anything else.
-------------------------------------
""....The design of CMT makes smaller caches possible...." Nope, the design of T2/T2+ means you have to make do with small cache split between all the cores and being flushed continuously as you switch between stalled threads. And every time there is a cache miss it's off to RAM ( relatively slow), local disk (very slow) or the SAN (extermely slow!). Which is why T2/T2+ can only shine with wheiner-threaded apps or light loads like webserving."
You havent read what I wrote. Or you did, but didnt understand. Well, it is not really that hard to understand (I hope). It doesnt require a M Sc actually. I will do my best to explain this again. Listen carefully.
An CPU will ALWAYS suffer from cache misses. There is no way of avoiding them. The only way to avoid cache misses entirely, is for the CPU to use psychic powers.
Fact: CPUs will ALWAYS suffer from cache misses. Intel Corp says 50% idle under full load because of cache misses on a normal 2GHz x86 server. This idling occurs because RAM is much slower than CPU.
Now there are two strategies to deal with this fact.
A) You try to minimize cache misses by using large caches and complex pre fetch logic. Then you can maybe decrease the idling from 50% down to 45%. But I doubt that, as Intel has applied both these techniques and stiil an Intel CPU idles 50%. The higher the frequency, the more idling. An 5GHz CPU idles maybe 70%? Dont know, just a guess. I havent seen studies on 5GHz frequencies.
B) You dont try to minimize cache misses at all. You KNOW there are nothing you can do to avoid them. Why not try to work around that problem instead of minimize misses, which is totally futile? It is a lost battle, chip makers have been trying to avoid misses for decades now. And the company with most research resources, Intel, is still stuck at 50%. If Intel's cpus are idling at 50%, despite all research, there is nothing you can do against cache misses. This is a fact. The CPU needs psychic powers to foresee the future to avoid cache misses. This is impossible.
Instead of fighting this fact, work around it. There WILL be cache misses. So we will use a new revolutionary technique which no one has thought of yet: try to mask the cache misses. As soon as there is a cache miss, switch thread at once (this is the key point - at once, normal CPUs takes hundreds of cycles to switch thread - they can as well wait for the data in RAM - it is equally slow) and continue working with another thread. So do not try to avoid cache misses, instead do some useful work while you wait. This is an unique and new solution.
You can never fight cache misses, by laws of probability and mathematics. Dont fight them, instead do some useful work instead of idling and waiting. This is extremely clever. After decades of research into route A) you still are stuck at 50% idle. Route A) is legacy, and you need a new solution.
Because of this new solution, you also dont care about cache misses. They will occur. You can not fight them. And because they will occur, you dont care about large cache sizes nor complex pre fetch logic. Because both of these legacy techniques are useful when you try to fight cache misses. But SUN doesnt fight them.
THIS is the reason Niagara doesnt have large caches or complex prefetch logic. Niagara doesnt combat cache misses, Niagara works around them.
If Niagara had large caches and complex prefetch logic, there would be little won. Then you have had to spend large amounts of transistors, for no benefit at all. Then Niagara would be a power hog like Power6 CPUs. Niagara would use 500Watt and still perform lousy - just as Power6. Now, Niagara uses ca 100 Watt (which is less than Intel's server CPUs) and still Niagara owns Power6.
Ok, have you finally understood the point with the Niagara approach? It is like a Gatling Gun (many many fast small bullets), instead of making larger and larger rifles Magnum, Mega-Magnum, Mega-Mega-Magnum just like Power6 does. The new solution is: many, smaller, faster bullets than one big bullet.
Niagara doesnt fight cache misses. It works around them. It is impossible to fight cache misses. Niagara doesnt need large cache because it doesnt fight cache misses. Large cache is only needed when you have another strategy; to fight cache misses.
I suggest you read this again. Slooooowly. Just like a Power6. It surely is a pain to explain tech to someone ignorant. But it is not really hard to understand this, the thing is Niagara uses a new technique. Not the old legacy technique with large caches and pre fetch logic.
Matt, besides complaining that Niagara has no large cache, you can also complain that Niagara has no elaborate pre fetch logic. For a legacy CPU, to not have complex pre fetch logic is really bad. But I hope you understand now, that they are not needed in Niagara's new unique solution.
--------------------------------------
Another thing. You state that you have benchmarked Niagara and it turned out to not suit your needs. That is fine. Niagara is not always best route for all problems. If you can use many small, fast bullets, then you use Niagara. If you want to use one large bullet then you are better off with a legacy CPU like Intel or Power6.
But all bench markers need to know this:
The Niagara CPU has to be loaded EXTREMELY high to shine. Several bench markers have loaded Niagara CPU with a small test work load, and then Niagara sucks. But when you load Niagara with large work loads, it never chokes but continues to work happily - this is due to it's new arcitetchure. If you only utilize a few threads, the Niagara will seem to suck. I have read several reviews that shows this. They used a test work load and concluded that Niagara was slowest in town. But when you load Niagara far beyond where the other legacy CPUs chokes, the Niagara just dont care. It can handle enourmous work loads, far much than any other legacy CPU. That is the point of using Niagara; load them up enormously. Far more than any other legacy CPU and you will see it performs extremely well. Load it up with few threads, and you have never seen it's potential.
This is the reason it wins over the Power6 CPU on many benchmarks. For instance,
ORACLE, SAP, spec_int, Lotus Notes, etc.
Here are just a few world records with the old CPU T2, where Power6 bites the dust.
http://johnjmclaughlin.blogspot.com/2007/10/utrasparc-t2-server-benchmark-results.html
And one last thing, these benchmarks above, is "not one carefully crafted bench makes Niagara faster than Power6". They are not carefully crafted benches from SUN. These benches are valid, and specified by other companies such as Oracle, etc.