As niche supercomputer-maker SiCortex works on the next generation of its line and watches the IT marketing machine gearing up for Intel's impending Nehalem-based Xeon EP, the company says that Chipzilla isn't moving in the right direction for high-performance computing (HPC) workloads. Sicortex logo "The major improvement is …
stick with MIPS, little SiCortex! Any bastion against x86. If they want to change processor architecture, maybe they should consider PPC like IBM is using.
Bread & butter computation vs Supercomputing
Intel may like to play in the supercomputing field but massively parallel jobs (which is supercomputing's area) are far less common in the commercial world than your generic database/java app server tasks which is where intel makes most of its cash.
There seems to be an enormous amount of engineering effort going into cramming more cores on a chip when the usual workloads don't require processes on different cores to talk to each other. In other words, a network load-balancer and blades with lower core counts are a more cost-efficient solution. There's no point having more cores than your memory bus can service and there's no point paying to engineer more core capacity when they don't need to be in the same box or on the same blade. Apache talks to your app servers, individual apache threads don't talk to each other. Your app servers talk to a database not to each other. Vertical scaling (bigger iron) can always fail as your workloads increase. If at that point your application can't be split up you are up the creek without a paddle. Designing for horizontal scaling (spreading the load - replicated databases, load-balanced java app servers etc) is a more sustainable architecture. You can always vertically scale your horizontal units.
Which is really more cost-effective, a 32-core x86 monster or 8 dual-cpu, dual-core blades/2RU boxes? And what happens if you need another 8 cores?
There's certainly scope for mainframes' massive IO in business but parallel workloads are altogether more unusual. Its rather like a bus saying a car is rubbish at transporting people. Cars do have lower capacity, but if you want to go somewhere the bus isn't going, the bus isn't useful.
Does anyone care about the CPU in HPC?
I don't. All my HP apps are in a higher-level language (C, C++, and even (shudder) Fortran (77, 90) (queue flames)) All I need is a good compiler. The same holds for all HPC apps I know. What is far more important than the CPU is the interconnect architecture. Designing for shared memory or different types of distributed systems requires different coding at the abstraction level of high level languages.
Depending on the speed of your interprocess communication bandwidth (rather than interprocessor, it is NOT the same on all systems) you can afford more or less communication in the algorithm. Load balancing and task distribution depend critically on how the CPUs are connected, much less on the kind of CPU. I have code that works quite happily on my dual core laptop, quad-core desktop, a quad-core opteron, a 16 CPU MIPS system, and (previously) on our old CRAY SV1e (RIP) with 32 processors, all essentially shared memory machines. It did not perform as well on the 4 x 4 core Xeon machine due to the memory-bandwidth limitations inherent in the design, but I would love to run it on a Nehalem. Run the same code on a cluster and it would CRAWL.
I couldn't agree more.
"And it is often better to have Chipzilla as an ally than an enemy." ..... Oh, most definitely, but not as a parasitic cancerous growth.
not anti Intel
Intel need new targets for their next long-term plan to justify high prices - I'd count the statement as constructive public criticism.
Horses for Courses
Intel chips are fine for the desktop, and for small clusters. The new Nehalem is essentially an upgrade, albeit a welcomed one. Another approach is needed for large clusters, and I will watch SiCortex with interest.
Of course a workload horizontally balanced across many machines is exploiting multi-threading, just not on shared memory. Pretty well anything that will load balance over multiple machines will load balance on a single machine assuming the software allows for multiple instances in a single OS (we have plenty like that). WIth NUMA architectures, affinity, improved memory bandwidth and larger caches it can scale very well as generally there's not much interference between the different instances. If you can't run multiple instances in a single OS there are always VMs )albeit there are overheads).
Once you start getting into the many, many tens of app servers, then other costs weigh in - all that configuration work, multiple instance installs, config management and so on.
Now this is not to say that a few big boxes is always better, just that there is a balance to be had unless you are Google and can afford to spend giga-bucks on engineering your applications and infrastructure. In many enterprises, the ability to run fewer, larger boxes with VMs or fewer boxes in a horizontally scaled app has some very considerable cost advantages up to a point.
Mem and Comm BW
Michael H.F. Wilkinson states it clearly and precisely: what matters is bandwidth -- both between processes and between the arithmetic unit and memory. It isn't about raw FLOPS, GHz, or IntelInside.
(And in fact, bandwidth and power is the whole point of the SiCortex architecture, though in its setting applications routinely scale to hundreds and even thousands of processors.)
In all the time I spent talking to customers and prospects, the choice of the instruction set (MIPS vs. x86) was never much of an issue.