SPARC immune to Meltdown
but susceptible to Spectre.
59 posts • joined 25 Aug 2015
Meltdown is a really serious design flaw in x86 cpus. SPARC is immune to Meltdown. It doesnt matter if you run Java ontop SPARC or any other software - Meltdown cannot happen.
All cpus are vulnerable to Spectre, which is a much more difficult bug to use and not a big worry. However, Meltdown is a big worry.
Mainframe cpus are slow. Mainframe I/O is superior. That is because Mainframes have lot of I/O co-processors. For instance, one Mainframe had 296.000 I/O channels. x86 does not have that many co-processors. But, OTOH, if you added co-processors to x86, it would be faster on I/O as well
Java can in theory be faster than C/C++ because the JVM is adaptive optimizing. Each run the JVM is optimizing more and more. C/C++ optimizes once, at compile time and never again. This means C/C++ must target the least common denominator, and not use vector instructions etc. JVM can turn on vector instructions if it discovers them in the cpu. Another example of optimization that C/C++ can not do: assume you run a large for loop in Java, the first time you iterate over a subclass, the next time you iterate over another subclass - JVM can optimize for different subclasses each run. C/C++ can not do that.
It is obvious that if you optimize continuously, it is better than optimizing once in the beginning, yes? All large and fast stock exchange systems are written in Java or C/C++ with sub 100 microseconds latency and enormous throughput - for instance the INET system that NASDAQ uses on Wall Street is written in Java. The secret to get speed with Java, is to never trigger the garbage collector. That is done by preallocating lot of objects and reuse them all the time, so no object is ever killed. If you ever trigger the GC, performance goes down. Realtime Java should be avoided for utmost speed.
Yes, SPARC M7 was typically 2-3x faster. Look back here:
I did not know about the new Intel 8180 cpu, thanks. As the SPARC M8 is faster than the M7, we have to wait and see the results of M8. I am confident M8 will be faster.
x86 cannot compete with SPARC as SPARC M7 is typically 2-3x faster than x86, up to 11x faster on some Enterprise workloads such as databases. Now the new SPARC M8 is twice as fast as M7, which means x86 is really slow in comparison. For instance, the SAP benchmark top spots are all dominated by SPARC. x86 is far below.
It is not stupid question, it is a good question.
The answer is that primes get more and more rarer, and there will always be large gaps where there are no primes:
I believe it is more efficient to examine a certain number that you suspect is a prime (looking like Mersenne numbers) and check if it is prime. Instead of checking many numbers in a row. There is no way to point out a prime number, no one knows if they are more frequent in some "areas" than other. So, we just basically check a random number - which takes a long time to do.
"...While this is understandably startling for the layman, a mathematician would just immediately propose the opposite assumption ("1=2") and drive it straight into the nearest impossibility using only rigorous logic on the way...."
Bertrand Russell held a lecture once, where he said:
-If you assume one single errornous piece as a fact, then you can prove anything!
-I dont belive you. If you assume that 1=2, can you prove that... you are the pope??
-The proof is trivial. I and the pope are 2 different persons, which means we are 1 person by (the faulty) assumption. Hence, I am the pope.
This is wrong. DIF/DIX disks does not protect against data corruption sufficiently. Have you ever looked at the specs for disks with DIX/DIF? All enterprise hard disk specs say something like "1 irrecoverable error on every 10^17 read bits", fibre channel, sas, etc - all equipped with DIX/DIF. The moral is that those disks also encounter corruption and when that occurs - they can not repair it. Also, these disks are susceptible to SILENT corruption - corruption that the disks never noticed. That is the worst corruption.
ZFS detects all forms of corruption and repairs them automatically if you have redundancy (mirror, raid, etc). DIF/DIX disks can not do that. Even if you have a single disk with ZFS, you can provide redundancy by using "copies=2" which makes all data duplicated all over the disk, halving disk storage.
"...ZFS is based around RAID5/6, which is frankly does not scale..."
This is pure wrong. A hardware raid card can only manage a few disks, so hw raid cards does not scale. OTOH, ZFS utilizes the disks directly, which means you can connect many SAS expanders and JBOD cards, so a single ZFS server can manage 1,000s of disks or more - you are limited by the number of ports on the server motherboard. ZFS scales well because it can use all the JBOD cards. A single hw raid card can not connect to all other hw raid cards - hw raid does not scale. ZFS scales.
In fact, the IBM Sequioa supercomputer has a Lustre system that uses a ZFS pool with 55 Petabyte data and 1TB/sec bandwidth - can a hardware raid card handle a single Petabyte? Or sustain 1TB/sec? Fact is, a CPU is much faster than a hw raid. So a server with TB of ram and 100s of cores, will always outclass a hwraid card - how can you say that ZFS does not scale? It uses all the resources of the entire server.
Regarding high RAM requirements, if you google a bit, there are several people running ZFS on raspberry pie with 256MB RAM. How can that be?
Also, you can change OS and servers without problems with ZFS. Change disks to another server, change OS between Solaris, Linux, FreeBSD and MacOS. You are free to choose
ZFS is the safest filesystem out there, scales best and is most open. Read the ZFS article on wikipedia for research papers where the scientists compare ZFS against other solutions, such as hw-raid and conclude that ZFS is the safest system out there. CERN has released several research papers saying the same thing - read the wikipedia article on ZFS.
Actually Sparc M7 is up to 11x faster than x86 on business workloads. It is the worlds fastest cpu, typically 2-11x faster than Xeon and Power8.
Sparc M8 willagain double the performance. X86 has no chance.
Look at the largest SAP benchmarks, all top spots are sparc and Solaris. X86 does not scale, it stops at 8-sockets 16-socket results are bad). If you need the largest business workloads, you must go to large risc 16- or 32-socket servers. Just chech SAP and see x86 scales bad. All top spots are sparc on large 16- or 32-socket servers
I doubt any filesystem except zfs, provides good data integrity with checksums. Btrfs is a piece of shit,so it doesnt. In fact, Lustre was rewritten to not use ext4 as backend because of data integrity issues and scalability problems. Instead Lustre is using, you guessed it, zfs as backend. Lustre is better than ceph,says some people
Come on, there is a lot of work in creating APIs too. You can create lot of classes that interact in many different and disparate ways, but how should they interact and work together? Interaction and usage is not trivial, but has to be considered well.
Typically, an API should be orthogonal, and simple and not duplicate functionality. This requires thought and design. This design work is worth something, right? You should be getting paid for all that work, right? It is like mathematics, to design an axiom system takes many years. First the mathematical objects have some properties, and after many tries the mathematicians finally distill the functionality into some axioms - which takes many many years. For instance, in Euclidean geometry the mathematicians spent hundreds of years debating the parallel axiom - should it be included in the axioms or not? I.e. mathematicians talked about the minimal API for Euclidean geometry - that took hundreds of years and lot of hard work to nail the important stuff and getting rid of all the fluff and unnecessary stuff. An API (i.e. axiom system) is not trivial to design, ask any mathematician
I mean, it is like creating a new wonderful dish, say, hamburger. After 100s of hours trying out different combinations, you settle down on a modern burger. The first burgers were meat + bread. Today, after 100s of years of evolution, the top quality burgers are much different, compare a Shake Schack burger to meat + bread - what a difference!! Sure, you can copy a Shake Shack burger easily today, but there have been lot of design and thought into Shake Shack burger. The same with API - it is easy to copy them when they are done - but on the way there were lot of trial and error. APIs define the classes, and you can design classes in other ways - but the Java classes are now "obvious". Back in time, it was not obvious at all.
If I were to design a kernel and the ABI/API from scratch it would take me years. But now I can instead look up common practice and copy typical kernel ABI/API functionality and that would save me many years of research. The same with API. It is easy to copy, but to design them requires lot of hard work. And you should be paid for hard work, yes?
According to STREAM ram bandwidth benchmarks, Intel typically does 60 GB/sec in real life benchmarks. 85 GB/sec is more a theoretical limit.
SPARC M7 does 130GB/sec in same benchmarks.
When you compare how x86 scales, vs Unix - the x86 falls flat. It is only recently the first 16-socket x86 servers have arrived to the market. Earlier the largest x86 server was an ordinary 8-socket Oracle/Dell/HP server. When you benchmarked x86 linux vs Solaris on similar hardware, the Solaris server was faster despite using lower clocked cpus (check for instance official SAP benchmarks). Linux scales bad on x86 with more than 8-sockets.
My point is that these large x86 servers are first generation and does not perform well. For instance, Solaris 11 and IBM AIX had a rewritten memory system to handle large RAM servers. I doubt Linux can do that, even though you can put in 32 TB in a server - Linux can not drive it. Linux scales bad cpu wise too, and cannot utilize 16-sockets well. Why should Linux utilize 32 TB RAM well?
So, I say that Unix/RISC scales far better than Linux/x86. Just check the benchmarks, typically x86 has bad scaling. For instance, SAP benchmarks. All the top SAP or TPC scores all belong to Unix/RISC. Linux on 16-socket x86 is far below and not near the top. So I dont agree. Linux/x86 gives too bad performance on large 16-socket servers with large RAM. It is no match for Unix/RISC.
SPARC M10-4S goes up to 64TB RAM. And 64-sockets. It is not a cluster, it is a business server.
And remember, the Linux SGI UV3000 servers are clusters, as they only run clustered workloads. The latency to nodes far away are terrible which means they can only do HPC number crunching work with little to no communication between the nodes. The opposite is business workloads, who communicate a lot between the nodes - so business workloads stops at 16- or 32-socket servers because latency will kill it, if the server has more cpus.
It is interesting to note that going from 4-socket to 16-socket Intel Xeons achieves a near perfect scalability. If we quadruple the 4-socket result, it would achieve 900,000 max-JOPS. But the 16-socket server achieves 86% of that, i.e. 777,000. That is close to 100%. We draw the conclusion that max-JOPS scales almost linearly.
This means the SPARC M7 should also scale close to linear. This means a 16-socket Oracle M7-16 server would achieve 16 * 160,000 max-JOPS = 2,560,000 max-JOPS with perfect scaling. However, we will not achieve perfect scaling, instead we only achieve 86% of the max performance, which means that M7-16 server should achieve 86% of 2,2560,000 = 2,210,000 max-JOPS in practice.
So compare the 16-socket x86 jbb2015 result of 777,000 max-JOPS to the 16-socket SPARC result of 2,210,000 max-JOPS. So who believes that "x86 is indeed superior" ???.
Let us compare the jbb2015 x86 results versus SPARC M7. Many consider the crit-JOPS as the more important and representatitve of real life workloads.
- 16-socket HPE Superdome reaches 777,000 max-JOPS and 85,000 crit-JOPS. It uses 2.20 GHz, 20 cores Intel Xeon E7-8890 v4.
- 4-socket Synergy reaches 225,000 max-JOPS and 51,000 crit-JOPS It uses 2.40 GHz, 20 cores Intel Xeon E7-8894 v4.
- 1-socket SPARC M7 reaches 160,000 max-JOPS and 60,000 crit-JOPS.
I dont really see how anybody can consider x86 any good at all? Now imagine the Oracle 16-socket SPARC M7-16 server. It crushes everything on the market.
HPE Superdome, 16-socket results:
SPARC M7, 1-socket:
"Java ... still doesn't play well with the OS leading to poor performance, unpredictable run-time behaviour"
Your Java skills are antique and outdated. In theory, adaptive optimizing compilers are faster than static compiling. Say you have Java code that runs a certain operation on a large list of a type of objects, the JVM will optimize for that type. In the next iteration, the large list contains another type of objects, the JVM will adapt and optimize for that another type. C++ can not do that kind of optimizations.
When you compile C++, you target a least common hardware denominator (no vector instructions, etc) so you can not use fancy hardware instructions. But the JVM will adapt from cpu to cpu, turning on vector instructions or what not. C++ can never do that kind of optimizations. So in theory Java is faster than C++.
In practice, all the worlds fastest stock exchanges with sub 100 micro latency, and huge throughput are developed in Java or C++. If Java had latency problems with garbage collection, then the stock exchanges such as NASDAQ on Wall Street would not use Java. Many ultra low latency high frequency traders are using Java, or C++. So you are wrong, Java is among the fastest platforms out there, rivaling C++.
The secret to get Java low latency, is to preallocate lot of objects and keep on reusing them. That way, garbage collection is never triggered. In effect, you turn off GB. This is used a lot in trading.
Girls, girls! Stop fighting. Here are SPARC M7 spec cpu 2006 rate results:
The score is 1,200 for SPECint_rate2006 peak. And 832 for SPECfp_rate2006 peak.
For this new SPARC M12 fujitsu, two of them cpus achieve 1501 SPECfp_rate2006. Which means one of them should achieve 750 SPECfp_rate2006. This means that SPARC M7 still looks to be the fastest cpu, even considering this new SPARC cpu. Anyway, it seems that SPARC is the top holder in SPECcpu2006, no matter if from Oracle or Fujitsu.
I read that IBM projects that POWER9 will be 2x as fast as POWER8, which means that POWER9 might be able to compete with SPARC M7. As of now, two POWER8 cpus are slower than one SPARC M7 in three out of four SPECcpu 2006 benchmarks - if you can trust the link above. For database benchmarks, SPARC M7 is not 2x as fast, but up to 15x faster than POWER8.
It depends on the workload if the OS can utilize all these threads. We distinguish between two different scaling: scale-up and scale-out.
-Scale-out workloads run all in parallel (embarassingly parallel workloads), there is not much communication going on between the threads. This is HPC cluster number crunching territory. Typically they run a tight for loop on the same grid of points, solving the same PDE over and over again, integrating in time. Everything fits in the cache. All these servers are clusters, such as SGI UV3000, supercomputers, etc. These clusters have 10.000s of cores, as they are a bunch of PCs sitting on a fast switch. They are cheap, if you buy a large cluster, you just pay the pay the price for a individual PC x the number of nodes.
Because all the workload fits into a cache, you never go out to RAM. Cpu cache is 10ns, and RAM is 100ns. Typically one scientist starts up a huge HPC task which takes several days to complete. So one user at a time.
-Scale-up workloads have lot of communication going on. They typically run business ERP workloads, such as SAP, Databases, etc. These workloads always serve many users at the same time, thousands of users or more. One user might do accounting, another payroll, etc. This means all these separate thousands of users data can not fit into a cpu cache. So business workloads always go out to RAM. That means 100 ns latency or so.
Say the cpu runs at 2 GHz. If you have 100 ns latency as you always go out to RAM, that means the 2GHz CPU slows down to 20 MHz. I dont know if you remember those 20 MHz cpus, but they are quite slow. So business workloads (communicating a lot, waiting for other threads to synch) serving thousands of users - have large problems with scaling up. Business servers maxes out at 16 or 32-socket cpus. Every cpu needs a connection to other cpus for fast access, and with 16 or 32 cpus, there will be lot of connections. Say you have 32 sockets, then you need (32 over 2) connections. That is 32*31 = 992 connections, that is quadratic growth. That is very messy. Going above 32 sockets is not doable, if you require that every cpu connects to another (which you do, for fast access). Look at all the connections for this 32--socket SPARC server:
So large business servers maxes out at 16- or 32-sockets. Clusters can not run business workloads. The reason is clusters have far too few connections. Clusters typically have 100s of cpus, or 1000s. You can not have a direct connection between cpu to cpu with that many cpus. So you cheat, one cpu connects to a group of other cpus. So accessing a cpu to another takes long time, because you need to locate the correct group, and then go to another cpu, and another, etc until you find the correct cpu. There are many hops.. And if you try to run business workloads on a cluster, performance will drop far below 20 MHz. Maybe down to 2MHz. And that is not doable.
So, clusters are scale-out servers typically having 10.000s of cores and 128 TB RAM or so. They are exclusively used for HPC workloads. Supercomputers belong to this arena. They typically run Linux.
Scale-up business servers typically have 16 sockets or so. This arena belongs to RISC such as SPARC / POWER / Mainframe running Solaris, AIX, or IBM zOS. There are no Linux nor x86 here. The reason is Linux does not scale well, x86 does not scale well either. The largest x86 business server was until recently 8-sockets. Look at all the business benchmarks, such as official SAP. All top SAP spots belong to SPARC. x86 comes far far below. Business workloads scales bad, so you need extraordinary servers to handle them, such as old and mature RISC servers. RISC has scaled to 32-sockets for decades. x86 not so. The largest scale-up business server on the market is Fujitsu M10-4S, which is a 64-socket Solaris SPARC server.
Linux does not scale well on business workloads, because until recently there did not exist large business servers beyond 8-sockets - so how can Linux scale well when there does not exist large x86 business servers?
The business arena belongs to RISC and Unix. One IBM P595 POWER6 server costed $35 million. Yes, one single server. Business servers are very lucrative and costs very much. Scalability is very very difficult and you have to pay a hefty premium. Business servers does not cost 1 PC x 32 nodes. No, the cost ramps up quadratically, because it becomes quadratically difficult to scale.
"... Linux is the dominant operating system for servers. Almost all high-performance computing runs on Linux. And the majority of mobile devices and embedded devices rely on Linux under the hood...."
Well, this is not really true. On large business servers segment, Linux does not even exist. That segment belongs to IBM Mainframes, SPARC, POWER. There are no large business servers based on x86, because they did not even existed until just recently. When you need large business workloads, such as SAP, databases, etc - you have no other choice than Mainframes, SPARC, POWER. Just check the SAP world records, the top spots all belong to RISC. The fastest x86 is a very new 16-socket server (which is a redesigned HP Superdome Unix server) and that SAP number is quite bad. Also, all various TPC benchmarks all belong to RISC. There are no x86 servers on the top.
If you need good scale-up perfomance, you need Unix running on RISC. Linux does not scale good on scale-up servers. Linux scales fine on clusters with many small nodes, but does not scale beyond 4-8 sockets on a single server.
Show me a good SAP benchmark running on Linux. You will not find any. Why? Because such scale-up business servers need a well scaling OS. And Linux is not one of them.
Regarding the T5 / M5 generations. I dont agree. You seem to argue that the only thing of importance is the basic building block; the core. In that case, Intel has basically used the same Core2duo core since many years back (decades?) - and using your argument, Intel has not released a new generation for decades(?).
So using your definition, AMD bulldozer was a generation, and now Ryzen is the next generation. And Intel P4 netburst design was a generation, and Core2duo was next generation - which Intel is stuck at even today. Some people dont agree with your viewpoint. Intel has released several generations since Core2duo.
Sure, you could say that if Intel released a new generation, the cpu performance would see a significant increase - but all Intel cpu releases the last decade are within 10% of each other - which means Intel are basically using the same design with only minor modifications. However, Ryzen is a big improvement to Bulldozer which implies Ryzen must be a new generation.
But many people dont agree with this. They say Intel has released many new generations since Core2duo, whereas you would claim Intel is still on the same basic Core2duo generation. It is possible to use the same building block and come up with totally new cpus. It is not clearcut what a "cpu generation" is, and nothing says that your definition is the correct one.
Regarding performance benchmarking. I think it is a bit funny that you dismiss all benchmarks now that SPARC has the crown. When POWER had the performance crown, I am quite sure that you insisted the competition should accept the POWER benchmarks. But now, all benchmarks should be rejected. I do believe that if POWER ever would take the crown again, you will insist that benchmarks are a valid way of comparing cpus against each other. But not now, because SPARC is fastest. What do you call such a behaviour?
"....But, if anybody truly believes Oracle has got something an order of magnitude better than anyone else in REAL world use cases, they're deluded. Genuine order of magnitude advances like that are as rare as rocking hose s**t...."
Well, you know that SPARC has a different view than other cpus? You do know that 1.2 GHz SPARC T1 was 50x (no typo) faster than 2.4GHz Intel Xeons on certain webserving loads with many light threads? You know that four 1.6GHz SPARC T2+ was as fast as fourteen (14) 5GHz POWER6 in official SIEBEL v8 benchmarks?
And now SPARC M7 is 11x faster than the competition on database workloads - is Oracle trying to fool us, are we deluded? Well, I suggest you study the DAX. It is a coprocessor in SPARC M7 which handles all database workloads. I hope you do know that specialized hardware is easily 10x faster than a general cpu doing it in software? Compare GPU to CPU. So why would it be surprising that a DAX coprocessor designed specifically for databases, is 10x faster than a general cpu doing it in software? There are various benchmarks out there, and in every case where DAX is used, the SPARC M7 is several times faster. This is consistent in every single benchmark.
Here are some benchmarks where DAX is used accelerating Java Streams. One external company rewrote parts of their engine to use DAX and got a 6-8x boost.
Apache Big Data SPARK gets 6x faster with DAX, than without
"...You can't ignore the number of cores. Does it matter if the server has a smaller number of sockets and more cores per cpu, or more sockets and fewer cores per cpu?..."
Sure, but SPARC M7 cores are faster than POWER8 and x86 cores. Just check the benchmarks. For instance, one SPARC M7 cpu with 32 cores, are faster than two 18-core Intel Xeon E5v4 cpus. Often SPARC cores are 2x faster than the competition.
"...Finally, if raw power was really so different as you claim, why has Oracle not succeeded in the HPC stakes? You will notice a distinct lack of Sparc based supercomputers..."
Oracle has explicitly said they are avoiding the HPC market because the market is so small. The lucrative market is high end business servers. For instance, one single IBM P595 server with 32 cpus that took the old TPC-C benchmark, costed 35 million USD list price. One single server. When you build a large HPC server, it takes many years of R&D and you get, one or two customers (try to export to Russia). If one customer backs out, you are toast. You are vulnerable. On the other hand, the market for high end business servers are huge in comparison and you just assemble your high end servers and sell them for a huge profit. That is why SGI and all the other HPC vendors are desperately trying to leave HPC and get into the scale-up big business market - that is where the big bucks is. That is Oracle playground. But clusters can not run business workloads, so SGI has a very hard time trying to build a large business server with as many as 16 or 32 sockets.
Regarding HPC. Now the largest HPC servers are slowly going to ARM cpus. Does this mean that ARM cpus are better than SPARC M7 and POWER8? Nope. Large HPC servers have other requirements than business servers (mainly performance vs wattage). HPC will not use DAX or encryption or what not. But if you really need high number crunching performance, well, SPARC M7 is fastest in the world on SPECcpu2006 workloads as well. And other number crunching workloads as Machine Learning, Neural Networks, etc etc. So if you need pure number crunching, SPARC M7 is much faster than POWER8 and x86. And if you need business workloads, SPARC M7 is many times faster as well.
Regarding Oracle software refuses to use POWER8 functionality such as encryption. Well, I have never discussed that. I only talked about benchmarks and performance. It is true that full encryption only slows down SPARC M7 something like 2-3%, this is proven by different benchmarks. How much slower does POWER8 get when turning on full encryption?
You sound like IBM when SPARC T5 arrived and was faster than anything IBM POWER could offer. IBM general manager for POWER systems, says about the performance race:
"...Companies today, Parris argued, have different priorities than the raw speed of chips..."
When IBM has released a good cpu, performance matters a lot and IBM releases benchmarks all over the web. And when the competition has good cpus, performance “was like 2002–not at all in tune with the market today" and IBM pretends performance does not matter at all.
Well, if you need the most extreme performing business servers, you have no choice than go to SPARC M7. At minimum it is 2-3x faster. For business workloads SPARC M7 is typically 5-10x faster, sometimes 15x faster than the best server the competition has. What do you think offers more value and is cheapest? One single SPARC M7 cpu outperforms four POWER8 cpus in SPECjEnterprise2010 for instance. You do the math.
Yes, you are right, it was not six SPARC cpus in five years, it was five SPARC cpus in five years. I wrote it hastily and it was late. And regarding generations or not, you claim that T5 and M5 is the same cpu, just different configuration - I dont agree. The M5 has 6-cores and it was influenced by the M4 (which was the successor to Fujitsu M3) and these 6-cores has 3.9 billion transistors in 643 mm^2. Whereas the T5 has 16-core with 1.5 billion transistors in 511 mm^2, is influenced by the Niagara cpus. There are quite some differences. Sure, they might share some building blocks, but they are totally different built.
Anyway, let us say you are right - how many other cpu vendors has "only" released three generations in 5 years? How often do we see new POWER generations?
"...Now, don't get me wrong. The latest chips from Oracle are pretty good and have their uses. They're nowhere near as far ahead as suggested here and I do note that the links are to the usual Oracle marketing FUD rather than anything independent...."
Well, I suggest you look at some of the 30ish world records, you will see there are verified independent benchmarks. For instance SPECcpu2006, SPECjbb2015, SPECjEnterprise2010, SAP, SPECvirt_sc2013, OLTPbenchmark, STREAM triad ram bandwidth, etc etc etc. And also other benchmarks such as different databases, neural networks, hadoop, PageRank, SHA, AES, etc etc. The benchmarks are very diverse, not just business workloads such as SAP or databases.
And in all cases, the SPARC M7 is typically 2-3x faster. Even in these verified SPECxxxx benchmarks. Some of the benchmarks are 11x faster (database workloads). Just search for these benchmarks on that link I gave you, or go each separate SPECxxx website and search there instead. You will see it is not Oracle FUD. These official SPECxxxx benchmarks are validated by others. Here are only some collected records: https://blogs.oracle.com/JeffV/entry/sparc_m7_arrives_breaks_records
"...one of the reason the Oracle Sparc processors are better is that the software is specifically written not to use the cryptographic accelerators in other chipsets!! ..."
In different benchmarks, Oracle proves time and again that performance only drops 2-3% when SPARC turn on encryption. We also see that x86 performance drops drastically when x86 turn on encryption only. Typically SPARC M7 is 5-10x faster in benchmarks such as AES. If SPARC turned on both encryption and compression, extrapolationg numbers M7 would maybe drop performance 4-5% in total, whereas x86 would halve performance again.
Regarding POWER8, it is funny that Oracle has not benchmarked against POWER8. So you might have a point. If POWER8 benchmarks shows that performance drop only 2-3% when turning on encryption, that is just as good as SPARC M7. Do you know how much performance typically drops when POWER8 turns on encryption and/or compression? This means that SPARC M7 runs compression and encryption, practically for free so you can always turn them on. Regarding x86 we see that performance drops catastrophically. Regarding POWER8, Oracle has not proven anything about performance loss. Maybe you can chime in here? Anyway, encryption or not, SPARC M7 is typically 2-3x faster than POWER8, in SPECjEnterprise2010 one single SPARC T7 cpu is faster than four POWER8 cpus.
So, it seems that Oralce has not benched encrption/compression against POWER8 - so your statement is not really correct in that "Oralce omits using the POWER8 encyrption chip".
Today the worlds fastest cpu is SPARC. Oracle has released six generations of cpus in five years. Each being minimum 100% faster than the previous generation. We dont talk about 5-10% faster (as Intel). Today the SPARC M7 is typically 2-3x faster than the fastest Intel Xeon or POWER8, all the way up to ~15x faster on database workloads. Here are 30ish world records, where SPARC M7 crushes x86 and POWER8. The coming POWER9 will only be 2x faster than POWER8, which means POWER9 will be slower than the current SPARC M7. And if you also turn on encryption and compression on all these benchmarks, expect x86 and POWER8 scores to go down to 25-33% of these numbers, whereas SPARC M7 gets a penalty of 2-3% in benchmarks. So if you want to use encryption and compression, SPARC M7 cpu is not typically 2-3x faster but it is 6-9x faster typically, all the way up to 45x faster on database workloads.
For instance, the last week Oracle released the Exadata SL6 (Sparc Linux 6) which is based on the SPARC M7 cpus, so performance has increased considerably when compared to the x86 version. The only difference between the new Exadata SL6 and the other one, is one uses SPARC and the other uses x86. And both run Linux. And the price is identical. But one single SPARC M7 can database scan 48 billion rows per second, achieving 143 GB/sec throughput, whereas two E5v3 achieved 20 GB/sec and was far slower. So if you need extreme performance, you have no other choice than use SPARC M7.
Well, instead of you ducking all the hard numbers and claiming your opponents are Trolls, how about you show us some benchmarks where IBM POWER8 is faster than SPARC M7? Put up, or shut up? Do you have any benchmark links that proves POWER8 is faster than SPARC M7? No? Then how can we believe that POWER8 is faster when there are no benchmarks out there? It makes you look like a FUDer?
-Trust me, POWER8 is faster than SPARC! Do I have benchmarks or any evidence at all, to back this claim up? No, you just have to trust me.
This is not really credible, eh? Scientific method is to back up your claims with some kind of evidence. And as of now, there is no evidence at all. That makes you look like a liar and FUDer, right? If you can show us benchmarks, then fine, then POWE8 is faster - but you can not show us hard facts. Because they don't exist, POWER8 is way slower. Wishful thinking does not convince anyone?
Today the POWER8 is slower than the previous generation E5v3. Intel has just released the E5v4 which is faster, so POWER8 is lagging behind more and more. Here we see several benchmarks where POWER8 is slower than x86
I doubt POWER9 will be faster than x86, because POWER8 is slower than x86. Why would POWER9 be faster?
Even if POWER9 is twice as fast as POWER8, it will still not be enough to outperform SPARC M7. SPARC M7 is typically 2-3x faster than POWER8 and the fastest x86 cpu. SPARC M7 is all the way up to 11x faster. Oracle SPARC servers are always >2x fast as the previous generation. In four years Oracle have released five generations SPARC, each generation doubled performance. Oracle does not rest, soon M8 will be relased, again doubling performance. POWER is slowest, and most expensive. Why do not IBM switch over to x86 completely and give up the hardware market, as IBM has done with almost everything else? IBM has already sold off all the hardware divisions except POWER and Mainframe.
"...I'm always on the lookout for excitement, but I don't think my house is big enough or my electricity supply powerful enough to run an IBM mainframe...."
Well you can emulate an IBM Mainframe on a laptop, using the open source TurboHercules. An old 8-socket Nehalem-EX would give you 3.200 MIPS, which is a decent midsized Mainframe. If you got the latest 8-socket x86 you would get something around 10.000-15.000 MIPS. And software emulation is 5-10x slower than running native code, so if you ported the Mainframe software to x86, then the latest 8-socket x86 server would actually give 50-75.000 MIPS - which is what IBM's largest Mainframe gives today. In other words, Mainframe cpus are not really that fast. You need 20 ish Mainframe cpus to match one 8-socket x86 server. But Mainframes are extremely expensive for the performance you pay.
"...[POWER] always seems to have been better at certain niche things than contemporary Intel chips...."
-POWER6 was several times faster than x86 and costed 10x more.
-POWER7 was 20% faster than x86 and costed 3x more
-POWER8 is slower than x86 and still cost more than x86.
For instance here we see that x86 demolishes POWER8 in SPECcpu2006 benchmarks:
That is the E5v3 x86 cpu. Intel just released E5v4 which is faster than E5v3, so x86 just strengthens the performance lead. I dont see why IBM manufactures slow cpus? POWER is slowest in the arena, and the most expensive. How does it add up? How can IBM sell the slowest servers for the highest price? I understand it earlier, when IBM was faster, but time has changed. IBM is slowest. For a long time POWER was on the verge of being terminated, for a long time there were no road maps. Todays POWER8 and POWER9 dissappoints on all fronts. Just look at the 25ish benchmarks in the link above, and see that POWER8 is slowest in almost (all?) benchmarks.
"...Sun found that out the hard way with Niagara (and later T3). Looked nice on paper, but in the real world (except for some quite specific workloads) it turned out not such a great idea ...."
The Niagara T1-T3 were niche processors. In that niche they excelled. An article said that a Niagara T1 cpu running on 1.2 GHz with 8 cores, where 50x faster than a Intel 2.4 GHz dual core server. No typo, 50x!!! The workload was about web server, serving many light weight clients.
Today the SPARC M7 is typically 2-3x faster than the fastest x86 cpu and POWER8. It is all the way up to 11x faster than x86 and POWER8. As the SPARC M7 can encrypt data for free, encryption costs 2-3%. Whereas on x86 typically performance will be halved or worse, hardly leaving no horse power over to do useful work on x86 when using encryption.
Here are 25ish benchmarks where SPARC M7 is 2-3x faster, such as SPEC2006, Hadoop, SAP, Neural Networks, Specjbb2005, etc etc:
(Funny thing is that if you look a bit on that site, you will see that SPARC M7 is twice as fast at SPECjEnterprise2010 than the stated record.
There is also an IBM Mainframe emulator called TurboHercules. An old Nehalem 8-socket will give 3.200 MIPS which is a mid sized IBM Mainframe. Todays x86 cpus are maybe 3x faster than Nehalem as they have 3x more cores, of course IPC and other improvements will boost performance much more than 3x, but let's stay on the pessimistic side and count with 3x faster.
So, a new 8-socket x86 server will give ca ~10.000 MIPS under software emulation with TurboHercules. If someone ported the IBM Mainframe software to x86, it would run 5-10x faster as software emulation incurs a penalty of 5-10x slower. That means a 8-socket x86 today would give somewhere between 50-100.000 MIPS, which is in par with the largest IBM Mainframe (which has around 75.000 MIPS I think). In other words, IBM Mainframes costs $millions but are slower than x86.
We all agree that Oracle products cost money, but Oracle products are mostly for large demanding customers that loose more money on going for an inferior alternative, than going for the best. For instance, Oracle Database is the best, the SPARC M7 is up to 10x faster than x86 or POWER8 on database workloads, and on many server workloads it is 2-3x faster:
OTOH, IBM charges much much more for their high end POWER8 gear than Oracle, while being much slower. My point is that if you look at what you get for the Oracle price, it can be worth the price for customers with the largest demands. In that area, there are no competition and Oracle is the best. Show us one single benchmark where SPARC M7 is not the best. Likewise for the Oracle Database.
The Mainframe cpus are much slower than a high end x86 Xeon cpu. Typically a high end x86 cpu is 2x faster than the fastest Mainframe cpu. Considering the largest IBM Mainframe sporting 24 sockets, there is no way 24 of those slow Mainframe z13 cpus can replace more than a 8-socket x86 server - if we talk about cpu performance. I/O wise the Mainframe is much faster than x86, but cpu wise, Mainframes does not stand a chance.
IBM claims a mainframe can replace 1.5000 x86 servers, it turns out that all the x86 servers are old antique Pentium3 cpus with 256MB RAM and they all idle. At the same time the Mainframe is fully loaded 100%. But what happens if a few of the x86 servers start to work? The Mainframe cpus could never catch up a loaded x86 cpu. There is a reason IBM never releases benchmarks comparing x86 to Mainframes - because Mainframes are slower. But you pay much more for the absymal cpu performance. No, if you need cpu performance, you dont go to Mainframes. If you need I/O you go to Mainframes. Mainframes dont stand a chance to a 8-socket x86 server today, which costs a fraction.
No, the Rockhopper Mainframe goes only up to 10TB RAM, which is not the equivalent of an Xeon. Intel Xeon 8-socket E7v3 goes up to 12 TB RAM, surpassing IBM Mainframes.
BTW, I dont understand why any new customer would want to use IBM Mainframes to run Linux? x86 are much much cheaper, and the Intel cpus are at least twice as fast as Mainframe cpus. So, Mainframe cpus are much slower, and Mainframes cost extremely much more than x86. So what is the use case of using an Mainframe running Linux over x86 servers? I dont get it. Sure, Mainframes have much better RAS and I/O throughput, but is that worth paying $millions for?
There are workloads that only big 16/32-socket scale-up servers can handle, typically enterprise business software workloads such as SAP, databases, etc. In those workloads, scale-out clusters can not replace scale-up servers. Scale-out clusters can only handle embarassingly parallel workloads, such as clustered HPC number crunching workloads, and not all problems are easily parallelizable. For instance, look at the official SAP benchmark list, there are not a single cluster. The top record spot is held by a Unix SPARC server with 32 sockets scoring 844.000 saps. (SAP Hana is a cluster, used for analyzing static read-only data in RAM, so it is great for clusters. It is not fit for normal SAP workload OLTP, it is only fit for analyzing static data).
Only recently (a couple of months ago) there has been new large 16/32 socket x86 servers offerings by SGI (UV300H) and HP (Kraken). Until then, the largest scale-up x86 server were ordinary 8-socket servers sold by Oracle, HP, etc. There is a new breed of x86 servers now out on the market, the first generation 16/32-socket server ever. All earlier large x86 servers out there has been scale-out clusters such as SGI UV2000 server with 10.000s of cores, resembles a small supercomputer cluster. And SGI UV2000 customers have used it for clustered workloads.
The first generation of x86 scale-up servers will perform very poorly, trying to get rid of bugs and identify bottle necks. It takes a couple of generations before the x86 scale-up servers will start to perform ok. They dont stand a chance to large Unix servers and Mainframes today. For instance, the SAP Hana is a clustered workload, and is certified with SGI UV300H. I want to see non-clustered workloads running on the scale-up UV300H - and expect it to perform very poorly compared to large Unix boxes on scale-up workloads such as SAP. The problem with SAP is it scales awfully bad, because it is a non clustered workload. So going from 500.000 saps to 600.000 saps is a huge and very difficult step. Because SAP scales bad, and the SGI UV300H is first generation, I expect SGI UV300H score quite bad on SAP compared to large Unix boxes.
"...[Sparc M7] Memory bandwidth of 160 GB/s per socket with 4 memory controllers servicing 32 cores versus 230 GB/s per socket with 2 memory controllers servicing 12 x POWER8 cores. Speaks for itself...."
Well, mr PowerMan, here are some information about IBMs superior memory bandwidth POWER8 claims. So, maybe you should not believe the IBM marketing FUD. It seems that Oracle SPARC M7 160GB/sec is faster than IBM Power8 230GB/sec in practice:
"...IBM says the sustained or delivered bandwidth of the IBM POWER8 12-core chip is 230 GB/s. This number is a peak bandwidth calculation: 230.4 GB/sec = 9.6 GHz * 3 (r+w) * 8 byte. A similar calculation is used by IBM for the POWER8 dual-chip-module (two 6-core chips) to show a sustained or delivered bandwidth of 192 GB/sec (192.0 GB/sec = 8.0 GHz * 3 (r+w) * 8 byte). Peaks are the theoretical limits used for marketing hype, but true measured delivered bandwidth is the only useful comparison to help one understand delivered performance of real applications.
The SPARC T7-4 server delivered over... 2.3 times the triad bisection bandwidth of a four-chip IBM Power System S824 server...."
Here are some Sparc M7 world records. It is typically 3x faster than the fastest x86 v3 CPU, and IBM power8. Going all the way to being >10x faster for database workloads. Btw, it is almost twice as fast as IBM power8 on STREAM memory bandwidth too.
Hardly impressive on SAP vs Power8???
Well you need three (3) IBM Power8 CPUs to match one single Sparc M7 in SAP benchmarks. If you consider power8 CPU to be fast, well the M7 CPU is three times faster. Isn't that impressive? Which CPU is the fastest in the world today? Sparc M7, beating everyone. It turns out that Intel Xeon is faster than the latest and newest power8 in some benchmarks, so power8 is worst in class, rendering it obsolete.
If memory bandwidth would be a problem, the M7 could not be 2-3 faster than power8 in benchmarks. For instance you need more than two power8 CPUs to match the M7 in spec2006 CPU benchmarks. How is that possible if bandwidth is a problem? IBM has lost big time, worst in class. IBM better exit the CPU market with these inferior and slow products
If we talk about the "actual high end, the top500 supercomputers", so no, top500 is not high end. Supercomputers are just clusters, and clusters can not run business enterprise software (as explained by SGI). The high margin lucrative market is business servers, with as a many as 16/32 sockets. For instance, one single 32-socket IBM POWER P595 server used for the old TPC-C record, costed $35 million. No typo. These large scale-up servers costs very much money, millions. Whereas a cluster is basically the cost of a bunch of nodes and a fast switch - very cheap. And a large cluster such as SGI UV2000 with 10.000 of cores and 64TB RAM - can never run business enterprise software. You need a large scale up Unix server with 16/32 sockets such as SPARC or POWER. Until a couple of months back, there did not exist larger x86 servers than 8-sockets. It is very difficult to build large 16 socket servers, x86 has tried for decades and failed. Now recently SGI released their UV300H which has 16-sockets, but I suspect performance is awful as it is the first generation 16-socket server, whereas SPARC goes up to 64-sockets today. And 64-sockets beat 16-sockets.
Regarding the shrinking Unix market. It is true that Unix market shrinks, but the Oracle engineered tailor made black boxes designed to run business software such as Oracle databases, is increasing very fast. That market is increasing whereas Unix market shrinks. The only time you need large Unix servers today is if you are going to run very very large workloads on business enterprise software, such as SAP. Check the SAP benchmarks, it is RISC all the way at the top. SPARC has top spot with 840.000 saps, whereas the best x86 server has 320.000 saps. x86 does hardly scale to larger than 8-sockets, which is nothing compared to 64-socket SPARC servers.
So if you need extreme enterprise business performance, or extreme RAS reliability, you must choose SPARC/POWER. Otherwise, x86 is fine for the low end.
BTW, the largest POWER8 server is E880 which scales up to 16-sockets and 16TB RAM. The largest SPARC is Fujitsu M10-4S with 64-sockets and 32TB RAM (soon 64TB). This year the Oracle SPARC M7 will be released with 32-sockets, 1.024 cores, 8.192 threads and 64 TB RAM. It can tackle the largest business workloads. Nobody else can, POWER8 can not, x86 can not.
There are two different versions of the HANA database everybody talks of; the clustered version that runs on x86 nodes. Maybe all nodes can aggregate 32TB RAM or so, in total. Add in compression and you can handle large databases from RAM, very fast. These RAM databases are (almost) exclusively used for reading, that is, analyzing data just like a data warehouse. Oracle TimesTen is also a RAM database that is only used for reading data, analysis. RAM databases typically have very rudimentary locks, or no locks at all - as they are designed for reading data, not transactions. Scale-out RAM database, read only.
HANA has also another database, one used for storing data, it is just a traditional vanilla normal database, for transactions. And this scale-up database is not clustered. It is only used on a large scale-up server, such as the SGI UV300H with 16-sockets. It stores all data on disks, not on RAM. Nobody talks about this traditional transaction database for disks.
So a HANA installation has two databases, one for storing data on disk, and one for caching data in RAM for analysis.
But, Oracle will release the SPARC M7 server this year. It is a single scale-up server with up to 64TB RAM. And if you apply compression, say 10:1, you can analyse very large databases from RAM. And a single scale-up server with 64TB RAM is faster than a scale-out cluster with 64TB RAM. So why don't people just use a single scale-up server instead of this HANA cluster?
Biting the hand that feeds IT © 1998–2020