Re: Switching from big iron to x86 virtualisation
"...Another point to consider with respect to your assertion that you will get good scaling from unmodified binaries on an M9000 is that it relies on running a huge number of threads to achieve high system throughput at the expense of a big hit in single-thread throughput. Legacy binaries are usually tuned to run on high-single-thread throughput systems, so I would expect to see better scaling and throughput from a system with a high-throughput cores (eg: Xeon) vs low-throughput cores (SPARC Tx)...."
Jesus. You are just totally off. Off. You havent understood much. It is the opposite: The M9000 is old, and use the old SPARC64 cpu which has 2 strong threads per core, and low throughput. The new "SPARC Tx" has very high throughput cores, with many threads. Xeon has low throughput and SPARC Tx (T5, etc) are high throughput. Xeon has in comparion: few strong cores and few threads, and SPARC T5 has many threads to boost throughput. This is just the opposite of what you believe.
Relatively this holds, if you compare them:
Xeon: low throughput because it has few strong threads
SPARC64: low throughput because it has few strong threads
SPARC Tx (T5): high throughput because it has many weaker threads.
You are stating the opposite. Just check the some of the world record benchmarks here, and see that SPARC T5 is SEVERAL times faster at high throughput server work loads (inlcuding SPECIint2006):
https://blogs.oracle.com/BestPerf/entry/20130326_sparc_t5_speccpu2006_rate
.
"....Probably because people like Oracle have their customers by the short and curlies. $legacy_vendor gets more margin if they can convince you that buying one of their new boxes will cost less than a new $legacy_vendor license + $competitor's box. The $legacy_vendor can set the license for the competition's box to cancel out the price difference AND they can maintain a nice fat margin because there is no competition (that's the whole point of legacy lock-in)...."
Wrong again. The Oracle database runs on Linux too. In fact, it is mainly developed on Linux I have heard of lately. So it would be very easy to migrate from Oracle database running on a very very expensive 32 socket server, to a cheap 32/64/128 socket Linux SGI cluster running Oracle database. But no one is doing that, why? And Oracle is not famous for cutting prices when they have you locked in, they are expensive. Many companies wants to migrate to other databases because of the high license costs. If you did not know this, I doubt you have worked at Wall Street as you claim.
Why dont no one migrate from a very expensive 32 socket Unix server to a cheap SGI cluster - running the same software? No vendor lock in exists, because you migrate from Oracle, to Oracle. Your explanations why noone does this are logically unsound. I tell you the answer: as the kernel developer explained, these cheap Linux clusters can not run large Oracle database configurations which requires SMP servers, and that is why no one migrates from expensive Unix SMP servers to cheap Linux clusters. Because clusters can not handle huge database configurations. The worst case RAM latency is more than 10.000ns in clusters, and performance of a database would grind to a halt.
.
"...I doubt [SGI cluster] is cheap...."
Wrong. Yes, the SGI cluster is cheap. Check the prices. It can in no way compare to 32 socket IBM P595 server used for the TPC-C record, it costed $35 million list price. I am convinced the SGI cluster costs like a few x86 cpus and a fast switch and not much more, or maybe twice that cost. You can buy several SGI clusters for the price of one 32 socket IBM server.
.
"...[SGI UV1000] it's not a cluster, it runs a *single* instance of the OS against shared memory. A single process can use every single byte of memory in that system. The same is not true of a cluster...."
You are wrong again. For instance, the ScaleMP Linux server with 1000s of cores shows the same charasterica: it runs 8192 cores and loads of RAM, and it runs a single Linux kernel image. But, it is actually a cluster. Just because it runs single image does not mean it is not a cluster. It can only run HPC workloads, just like the SGI cluster. Both clusters consists of several smaller nodes, connected to look like one giant server running single image kernel:
http://www.theregister.co.uk/2011/09/20/scalemp_supports_amd_opterons/
"... vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space....The vSMP hypervisor that glues systems together is not for every workload, but on workloads where there is a lot of message passing between server nodes – financial modeling, supercomputing, data analytics, and similar parallel workloads. Shai Fultheim, the company's founder and chief executive officer, says ScaleMP has over 300 customers now. "We focused on HPC as the low-hanging fruit".... "
Check the SGI and ScaleMP workloads, they are all HPC workloads. For a reason. No customer runs a large database configuration such as Oracle.
.
"...With SMP all memory is remote, so you are operating at worst case (but uniform) latency all the time, by contrast with NUMA you get the best possible latency for local accesses and the same worst possible latency for remote accesses as you would have with an SMP box...."
True. The Oracle servers are SMP alike, and has very low worst case latency, 500ns or so. In effect you treat it like a true SMP server. The SGI and ScaleMP clusters have latency of 10.000ns or much higher - these must be treated as clusters, and can only run cluster software. That is why all SGI and ScaleMP customers are running HPC workloads.
.
"...All the M9000 does is hide the latency from your code by putting your code to sleep and running another thread whenever it has to hit main memory. Xeons achieve a similar trick with HyperThreading..."
Wrong again. You are mixing the M9000 cpus with SPARC Niagara cpus. The Niagara cpus hide latency by switching to another thread when the cache pipeline stalls, and it has many cores and many threads to be able to achieve a very high throughput. The M9000 has the old SPARC64 cpu, with only 1-2 threads. The old SPARC64 is very similar to older x86 or odler IBM POWER cpus: few cores, 1-2 strong threads. All cpus where constructed like this long ago. Then came the SPARC Niagara and changed everything with many cores and many threads, and now every cpu is similar to Niagara with many cores and many threads: POWER7, Xeon. The SPARC64 is not good at high throughput, it has few strong threads, not many threads.
So, you dont know too much about M9000 or SPARC64 cpus. You are mixing them. That might explain why are so off with your knowledge.
.
"....I'm done. I can't see much point in writing stuff for someone who shows no evidence of being able to read and learn...."
I have proved that you are wrong in many(every?) bit of your reasoning. It is you who needs to read and catch up.