"Select" customers who use Sparc-based Solaris systems are being asked to participate in a beta program for Oracle's next-generation of Sparc T4 systems. And briefly, thanks to a typo in a blog post, it also appeared that those Sparc T4 machines had slipped into 2012 – but worry not: they're still on schedule. According to a …
I don't trust Oracle's hardware commitment
Hardware is a tough business. Just look at how many hardware companies have gone out of business. Given Sun's lack of investment and bad investments over the years there is no way Oracle can reverse the problem in under 3-5 years. There is a business for existing customers and it does look like Oracle is at least investing in trying to keep them. Unfortunately, the price increases on maintenance, Oracle only maintenance, lack of support for UltraSPARC, end of SPARC64 and breaking their binary compatibility we wont even talk to Oracle about hardware.
re: I don't trust Oracle's hardware commitment
Again, with the badly formed FUD. "breaking their binary compatibility" What? Are you kidding? Please give a reference for your lies. You make unsubstantiated statements and never back them up.
Between you and that Kebab guy, I find it difficult to be as irritated with MB as I once was.
Binary compability of SPARC
I think perhaps that Allison Park took his mouth a bit to full. There is AFAIK full forward going binary comparability between the Tx processors and SPARC64 or the other SPARC processors.
But, from what I hear from our Solaris group, is that you need to retune your software stack when going from SPARC64 or SPARC IV or .. to the Niagara, and sometimes it's a .. real.. shitty deal to put it blunt.
Again that is not so much the fault of the Niagara architecture, which is kind of beautiful if used for the right workload, but more the fault of SUN and Oracle sales teams who have sold an architecture to their clients which didn't really fit in.
Furthermore you have the whole Solaris 11 dropping support for rather new "old" SPARC processors. So it's not like customers doesn't have enough to be angry about.
I agree with Billl:
"...You make unsubstantiated statements and never back them up..."
Allison, you should back up your claims, or else you are just trying to start false rumours, that is: FUD.
I am always backing up my claims, to the point that people here is complaining that I too often post links (to white papers, benchmarks, etc). I think you Allison, should do the same, if you want us to believe on your words.
Oracle is absolutely commited to hardware
When you sell the hardware and the software, the lock-in is much more effective than if you only control one of the two.
The kicker is that to get people into the lockin, you need the hardware to perform, something Sun/SPARC has struggled with for a few years. The T3 chips are great if your workload is heavily multithreaded, but if your app needs that single thread performance, it's pointless to have all those extra threads.
Oracle need something which can compete with IBM's p795; the SPECint rates show the 795 being 3 times faster than a fully loaded M9000 over 2 cabinets. Even allowing for the vagaries of benchmarking, that's a pretty awful comparison for Oracle.
Actually you are being too kind.
It's actually 3.6 times the performance. So you are being too kind.
And the current T3 chip does not match the best of the pack, like POWER7 and Westmere-EX, unless the workload can take advantage of the buildin accelerators in the T3.
Furthermore you might say that the T4 is to little to late, and this slippage, only makes it worse. Cause there is nothing about improved throughput on a per chip level on the Oracle roadmap for the T4, so the performance gap will only get larger.
Furthermore yet another SPARC roadmap slip is not what Oracle needs right now. If the roadmap that Larry put our less than a year ago is already starting to slip, and slip for processors that were supposed to be shipping this year. Then I guess that HP and IBM sales guys will have a field day.
is the T4 still going to be only 4 sockets?
I have seen SPARC64 is end of life and the new m-class will be T chips, but the real question is can Oracle make the T chips get past 4 sockets? Everything in their roadmaps assumes they will be able to scale to 32 sockets where so far they have only been able to get to 4 and most of their systems engineers have left.
I will have to consult the great Oracle
I'm not sure where you get the view that SPARC64 is EOL? The old SPARC64 chips Fujitsu used to ship (sun4us) are not supported in Solaris 11 and haven't been shipped for some time, as far as I'm aware. The M-Series servers will continue to use SPARC64 technology although they'll need some speed bumps to stay even vaguely competitive.
You are correct in that the T4/T5 chips will need to scale past 4 sockets in order to replace the M-series servers, I don't know if that's the goal of Oracle but it would make sense, provided they can get the performance (particularly single-threaded performance) out of them. The biggest problem in scaling T-chips is keeping all those cores and caches in sync across many sockets and keeping the performance penalty of cross calls to a minimum.
RE: Actually you are being too kind.
Yeah, if I was a Snoreacle server customer I wouldn't be too upset at a few months slippage, but I'd be very upset to be told the new T cores are only 3 times faster per core on single-threaded apps! I've tested the CMT servers with heavy single-threaded apps and they really suck, so three times faster than really slow is still damn slow.
re: is the T4 still going to be only 4 sockets?
EOL? Lies, lies and more lies. Show your source Allison. No lying anymore.
Fujitsu SPARC64 roadmap
There are no future SPARC64 chips. No 45nm, No DDR3, No Octocore, No future.
It is at its "end of life"
Interesting that the url is the same as before but Fujitsu changed the title to say history.
re: Fujitsu SPARC64 roadmap
Fujitsu not showing a proper roadmap is not the same as being EOL. Are you really in this industry? From what Oracle and Fujitsu says, they will both be using an Oracle design moving forward -- at least that's what I believe they've said. SPARC64 has not been EOL'd and SPARC in general is far from EOL'd.
Not showing the proper roadmap?
Fujitsu is showing the only roadmap that exists for SPARC64. End of the road, the last chip, no future, no ddr3, no 45nm, no octocore. As far as end of life...that depends on your definition of EOL....sounds to me like its on life support ready for a massive heart attack.
I read in the comments in this site, that there are no roadmaps beyond IBM POWER8(?). Does that also mean that POWER is EOL, you reckon? So, if people posted "POWER is EOL, because there are no roadmaps beyond POWER(8)?" - would you say this is a substantiated claim?
Matt, you should know by now that CMT servers are not for single threaded apps, but for highly threaded code. Even Sun/Oracle says so. If you use CMT servers for work loads they are not designed for, dont be surprised of the result.
It is like using CMT workloads on Itanium and drawing a general conclusion that Itaniums are slow. That would not be a fair conclusion, would it?
RE: @Matt Bryant
"Matt, you should know by now that CMT servers are not for single threaded apps...." That's the whole point - CMT is not just bad at single-threaded perfromance, it's downright awful! For years Sun denied it, now Oracle are having to face the fact that the mjority of business apps have a heavy demand for single-threaded performance.
".......It is like using CMT workloads on Itanium and drawing a general conclusion that Itaniums are slow......" The difference is you are unlikely to need to run a CMT app on Itanium, but you are certainly going to need good single-threaded performance with the current and planned generation of business applications. The fact that Snoreacle are redesigning the CMT line in a desperate attempt to increase single-threaded performance just shows that Snoreacle have finally twigged to this simple fact.
"...For years Sun denied it, now Oracle are having to face the fact that the mjority of business apps have a heavy demand for single-threaded performance..."
No, this is not correct. Sun has never denied that CMT servers have weak thread performance. Sun has been very clear on it. CMT are for high throughput with many light threads. This was the official words from Sun. Everyone knows this, including you and me.
It would be very dumb of Sun to claim CMT servers have good single threaded performance, when anyone just needed to look at benchmarks to invalidate that claim. That would destroy credibility and would be marked as an outright lie from Sun.
I would like you to post a link where Sun claimed that CMT servers have good single threaded performance? There are no such links. Can you find links when Oracle claims that? There are no such links either.
SPECint_rate is not single strand, but we will see... if only Oracle decides to publish any industry standard benchmark :)
RE: Reading skills ...
"Matt, you've read the " up to 5 times the performance" statement of Rick Hetherington in the article above or did you just looked at the pictures ?...." Sorry, jeorg, that must have been the male bovine manure filter kicking in and automatuically reducing the doubtlessly inflated marketeer claim down to what is probably the best a customer will actually see in a one-in-a-million case, best-for-CMT setup (i.e., not what happens in real life). Don't worry, I usually like to bench kit in my own environment with my app stack and my own data (as that's really the only benchmark that matters to my company), and the view held by our board of anything Slowaris-related is so poor there is very little chance of me being asked to do that, so you can carry on bleating the Sunshine figures from Rick if you like.
Your own "comparison" with Xeon and Pee7 neglects to include the fact that both lines will have moved on to faster versions by the time the new CMT chip arrives (if it does). You also forgot to mention that both already scale far further than the CMT offerings, which means the new CMT design will be stuck fighting (and losing) the same edge-server/webserver niche that Xeon has been killing it in for years. And then we have you extrapolating an existing SPARC CPU with an imaginary marketting figure - how many times have you Sunshiners been proven wrong on those tricks? Remeber the claims that UltraSPANKed IV was going to be soooooo uberfast? Yeah, that worked out - not! Let's not mention the predictions for UltraSPANKed V, that would only make you Sunshiners cry.
Well you are doing a bit to much number magic here.
Single strand going up to 5 timers faster is not the same as taking the per core performance (8 threads/strands) of the T3 and then multiplying that number with 5. It's taking the performance of one thread/strand of the T3 and multiplying that by 5.
IMHO the Maximum per thread performance of the T4 will most likely be in the range of 12-15 SPECintRate2006. (that is 5 times the performance of the max throughput of a single T3 thread (which again AFAIR is ~x2 the throughput of a single thread when all threads are running))
Which might sound really not good if you look at your examples with other architectures. But... I actually think that is kind of OK. Why ?
Because the single strand performance of a Westmere-EX with Hyperthreading enabled isn't the same as the per core performance either. And the same goes for POWER7 with SMT enabled.
Both processors when running with SMT/Hyperthreading enabled does favour throughput over single strand throughput. And there is a price to pay for that. So their single thread/strand throughput isn't equal to specintrate2006 divided by they number of cores either.
Try having a look at the specint2006 score of POWER6, where there actually are numbers that you can use, cause the specint2006 scores didn't use autopar.
IBM System p 570 (4.7 GHz, 1 core-1 Thread) 21.6 specint 2006
Now running on 2 cores with specint2006rate (4 threads) the score becomes:
IBM System p 570 (4.7 GHz, 2 core-2 Threads) 60.9 specint2006 rate.
Hence doubling the number of cores gives almost a factor of 3 in performance. So using your math this would have given POWER6@4.7GHz a specint2006 score of 30.5. (without autopar) Which it clearly doesn't have.
The difference is (at least for POWER7 I haven't read up on the latest enhancements for Hyperthreading so I'll keep from making faulty statements on that one) that POWER7 is able to allocate all the resources (execution slots) to a single thread cause it's running a fairly clever implementation of SMT, and thus be able to reach pretty close to the MAX per thread throughput it is actually capable of. The fine grained (round robin) way of the T3 can't do that trick, but from what I understand about Yosemite Falls (T4) it can do much the same thing as POWER6/POWER7 and thus get both good per thread throughput and good per chip throughput.
Now on the other hand IMHO it's still to little to late...
I think you might be double-counting the benefits. The SPEC report for T3-1 shows that 127 copies were needed to get a 166 peak rating. That's about 1.3 per thread. If the T4 is 5x faster running a single thread per core, that works out to a rating of 6.5 per core, not 53
RE: But to be honest ...
"....I just think that T4 will solve the single-thread performance issue many people had with T3 and before....." I don't have a problem with the single-thread performance of the CMT processors, but that's because we're not stupid enough to use it for solutions where a good single-threaded performance is required. Well, actually, we're just not stupid enough to buy CMT, fullstop, but that's more becuase we have a bigger problem with lack of faith in the Snoreacle ability to deliver; a chronic failing to accept the rediculous price difference between a Snoreacle CMT server and a Xeon one from hp or IBM that does more for less with our app stacks, let alone the way Power and Itanium urinate on CMT like a waterfall; and a general disbelief that there is any value in accepting the awful support experience compared to that offered by even Dell.
".....And that's a good thing ..." Yeah, for hp, IBM, Dell.....
"....a chronic failing to accept the rediculous price difference between a Snoreacle CMT server and a Xeon one from hp or IBM that does more for less with our app stacks, let alone the way Power and Itanium urinate on CMT like a waterfall;..."
I dont get it. "Ridiculous price difference"?
In Siebel v8 benchmarks, you needed six (6) POWER6 servers to match one Sun T5440. One of the IBM P570 costed 413.000 USD. The T5440 costed 76.000 USD. I agree the price was ridiciculous: in Suns favour.
Regarding the CMT cpus, you know the T3 has several world records today, faster than POWER7 or Itanium. Here are some world records:
And the TPC-C world record was done on CMT cpus. I would say that Power and Itanium urinates on CMT? I mean Itanium reached 4million tmpc. CMT reached 30 million tmpc. Rather, the SPARC CMT solution is almost 10x faster. So how can Itanium urinate on CMT?
I dont get it.
Funny how you keep pulling 4 year old hardware out of your hat when you have to compare against your favourite Oracle products. It's to be quite honest ridiculous.
And your link here is to a benchmark made by Oracle, controlled by Oracle...
And then there is the TPC-C cluster benchmark again and again that you keep comparing to non clustered results. You aren't being serious.
With regards to Round robin and thread switching on the Niagara.
Well you are quite right that the newest version of the Niagara family of processor cores won't do round robin in the way that they will idle 7/(2x4=8) of the time if only a single thread is active.
They will do a LRU (which they call Least Recently Fetched (LRF)) which is a round robin, if more than one thread is active, dispatch to active threads on each of the two execution units.
You can even call it a bit of a hybrid fine and coarse grained multithreading.
But the Niagara core (pre T4) is still a much much simpler core than for example the POWER7 core, and one thread cannot dynamically take up all the resources in the core, as a single thread can on the POWER7 core, if it needs to. Which leads me to your remark.
Yes you are quite right, that the average per thread throughput of a POWER7 will be the throughput divided by the number of cores divided by the number of threads. Sure, this is something that we have discussed before.
BUT you still can get really really good single threaded throughput, if nobody else is using the resources in the core, I would estimate that the single threaded throughput of the POWER7 is around 26 specint2006, if you divde out and adjust for SMT giving around 80% or something. And you don't need to disable SMT or or or.. you just need the condition where one thread is executing alone on a physical core.
And that is the real difference between POWER7 and T1/T2/T3 cores. You get both the good throughput and the good single threaded throughput. Sure you don't get both at the same time looking at a per core level.