Server virtualization has spent the past several decades moving out from the mainframe to Unix boxes and then out into the wild racks of x64 servers running Windows, Linux, and a smattering of other operating systems in the corporate data center. The one place where virtualization hasn't taken off is in high performance …
They kind of do it...
...with grid software. Grid software automatically dispatches jobs to whatever nodes are 1) able to run the workload (right hw, o/s, etc.) and have the capacity to run the workload (free cpu, memory). The grid manager node constantly monitors client systems to make sure they're still alive and doing the tasks they are supposed to be doing. If one of the worker bee nodes dies, the grid manager can send it's tasks to another node and even reboot the failed system. It's pretty cool stuff that hasn't really caught on in commercial data centers yet. It's not quite hypervisor-based virtualization, but it can provide many of the same benefits - higher hardware utilization, lower management effort, and, ultimately, more bang for the buck.
VMware customers ARE able to publish the performance stats. This wasn't the case in the old ESX 2.x EULA, but was removed in the ESX 3.x. The reason for the restriction wasn't some conspiracy to protect VMware, but the fact that most end-user customer didn't understand virtualization well enough to generate and publish meaningful statistics.
HPC implies jobs too big for virtualisation to help
If your job is small enough that it can share a computer, you might as well run it on the computer on your desk; once your unit of allocation is the computer (or if you're using Blue Gene at Julich, the rack with 1024 CPUs in it), there seems little point in virtualisation.
Clusters, particularly small clusters which were bought for One Big Job and then sit in the corner unloved, tend to be quite idle, but the right response to that is at the low level shutdown -hp and wakeonlan, and at the high level politics such that your individual research groups find it better to contribute funding to the university, corporate or national Bloody Big Cluster then submit their jobs to that.
Oil and Water
Given that HPC seems to me more about one or two functions (modelling, massive calculations etc) spread across lots of CPUs, while virtualisation is almost the opposite in that it tries to consolidate as many different functions (mail, applications, file and print etc) onto as few CPU as possible, the description of oil and water is quite accurate.
As HPC software is supposed to be multithreaded, running on lots of tin, I'm not sure you'd want to waste even a small percentage of CPU cycles on a hypervisor.
It might be a means of using large silly number CPU core boxes with HPC operating systems that don't scale to large numbers of CPUs particularly well (you could, for example run lots of Windows Compute VMs, each with two or four vCPUs on a great big 8 CPU quad core box).
All a bit geeky to me though.
virtualization and jitter
One thing that kills HPC with scale is OS jitter. This becomes an huge issue with large core counts: when you run parallel jobs they need to run in sync: if any core lags (running another process, or handling interrupts,etc) , then the other (n-1) cores waste cycles waiting for it. You absolutely can't have a CPU moonlighting as part of another job!
OTOH, virtualization could add nice 'checkpointing' facilities and simplify scheduling on HPC clusters. But virtualizaton would need to be implemented across the cluster as a whole, with negligible overhead (5 - 10% performance loss is what loses you the bid in HPC circles). But its solving a different problem than in enterprise virtualization: adding / removing cores "dynamically" is already a solved problem in HPC, done with schedulers such as PBS and loadleveler.
A friend of mine works at a company that occasionally runs a computation or two. This week, it was geotagging and transforming some aerial photos. About a week's work on an 8-core machine.
He was asking me how much overhead he could expect in a Xen paravirtualized guest instead of running on bare metal. According to my experience, if the load is not IO-bound, there should be very little overhead.
Off he went and did some testing. The result? On the Xen guest, the job ran 10% FASTER. I couldn't believe it. There is still no explanation for this, but he is in contact with some Intel engineers about it... At least he was on Thursday evening, when we last talked about it.
Virtualization is what you make of it. In the app server world the trend is certainly to consolidate in the manner you describe. But virt can be used to decouple the hardware from the OS the clients (people doing computations) use in a compute environment.
only 40% 40 pct or so runs on hypervisor, negligible indeed ;-)
starting from Power5 (tm) all POWER hardware runs a hypervisor, wether you want it or not.
you cant get to your hardware (cpu,memory, adapters) directly, in those IBM powersystems.
Solid prove that virtualization does not need to reduce performance if you do it right.
Well, what is cheaper?
Virtualization and HPC can benefit from each other if the complexity of administering a cluster is way more expensive than the performance loss. This is seldom the case, yes.
Now, replying to the "Negative overhead" comment: There _can_ be a reason for actual speedup when using virtualized hardware - even more, when using paravirtualized devices. If the amount of network transactions you do is very high, and your VMs use the paravirt network devices, the network streams might be passed back and forth between your instances faster than if they were to be properly sent to a NIC. Even at gigabit speeds - A NIC operating at gigabit can use this speed for bursts of information, but the latency of the signal travelling probably 2 or 3 meters to the switch and back can be measurable.
Then again, maybe your app could use less network, sychronize less often, and gain performance.
... and another thing ...
I worked for an HPC vendor who provided a lot of systems to the oil industry for seismological survey analysis. These systems typically ran only one job to avoid losing cycles to anything else - the hypervisor overhead would be a showstopper. The other problem they had was simply keeping the CPUs fed, the I/O requirements are huge (the disk controller was their previous generation supercomputer) and the systems have specially designed buses to cope - virtualised devices wouldn't be good enough.
The CPU time on these systems is shatteringly expensive so any time it isn't actively working on your job it's very bad. In some cases we rewrote parts of the kernel for customers to improve the performance of a particular application because that was all it would ever run.
The University where I work is trialling the use of virtualisation running on top of labs full of Windows desktop machines to provide UNIX HPC grid functionality without the user interruption or control/management hassle of dual booting large numbers of PCs. By all accounts it appears to be a reasonably functional and easily managed model for augmenting our other grid nodes (for suitable workloads, of course.)
getting the most out of HPC
I remember a talk I went to a few years back given by the guys who run our HPC farm. We have hundreds of users. One way to optimize performance is to run multiple jobs on each CPU. Under their testing they found that about 5-6 jobs per CPU is optimal,depending on the processor.
Say each job takes 1 hour and you have 5000+ jobs in your que with each user submitting hundreds of jobs to your farm. The user may expect his/her batch of jobs to take many days. Typically the hold up in any one job is disk I/O. The idea of multiple jobs on single CPUs is that if one job gets held up because of disk I/O then the other 4 jobs running on the CPU increase their share of the CPU.
1 job per CPU * 6 CPUs = 7 hours
6 jobs per CPU * 1 CPU = 6.5 hours
The key ingredient in this approach is a decent kernel scheduler (no prizes for guessing what OS we use/don't use). Anything like virtualisation is pointless CPU overhead hogging suggested by management types who want to use buzzwords without really knowing what their on about.
HPC will be the making of Virtualisation!
Without a doubt, the current and next generation of virtualisation hypervisors have little to offer HPC because the management objectives are currently orthogonal, with virtualisation focused exclusively on sharing resources.. but there are common requirements for provisioning, configuration, acquisition and release.
Virtualisation will be mature, when it offers benefits to HPC:
1. When the hypervisor is not just a master scheduler, but allocates dedicated resources in the way PRSM partitioned Amdahl mainframes more efficiently than VM/CMS
2. When the hypervisor can act as a loader and hand the full machine to the client OS in the way DOS did for Windows or V=R in VM/ESA with a wakeup handle in the hypervisor aware client OS.
3. When the hypervisor can discover peers and hyper-hypervisors on discovered networks, and discover topologies and capabilities.
When the time comes that the hypervisor is embedded like the BIOS, supporting VM, Partitioning and booting, the Hypervisor will be the kernel that even HPC is built on