datacentre orchestration - already solved
Your problem of "datacentre orchestration" has already been solved by VMware with Distributed Resource Scheduler.
M$ is busy trying to reverse engineer it now without patent infringement no doubt.
Lower costs are the basic attraction of all enterprise technologies, and virtualisation promises that in spades. In particular, it reduces hardware maintenance costs via what is now a fairly simple process of packaging physical servers up and hosting several of them on one large server. The technology can also lower energy …
From the article: "More critically, shoehorning an I/O-intensive job such as a big database application into a VM can lead to problems if you haven’t done the sums first and made sure that the host’s I/O capabilities, along with all the downstream technologies, are up to the task."
I'd go as far as saying that shoehorning ANY I/O intensive job into a full VM (as opposed to a chroot type VM) is a recipe for disaster. Typical performance degradation on disk I/O intensive tasks ranges from 40% for something relatively CPU bound (e.g. Linux kernel compile) to an eyewatering 2,000% (yes, that's 20x) on something heavily disk bound (e.g. SQL databases). Anybody telling you otherwise is probably a virtualization salesperson.
Chroot type VMs, OTOH, come with overheads that amount to rounding errors of a percentage point. Examples of such are:
Linux: VServer, OpenVZ, LXC
FreeBSD: Jails
Solaris: Zones
Linux VServer even comes with a unique feature to dedupe files into copy-on-write hard-links across guests, which not only reduces disk usage but also memory usage by unifying the memory used by the merged shared libraries. Inodes for the same DLL in all the guests end up being the same, so they get mmap()-ed into the same block of memory. It's a bit like memory-deduping, only more efficient and more natural. By leveraging this you can achieve some truly insane resource overbooking ratios (hundreds:1 has been achieved) with negligible loss of performance.
Lovely post Gordan :-) I didn't know whether to laugh or to cry when a couple of members of staff told me how virtualization would actually improve the speed of our largely I/O bound workload when the tests we'd done showed it was running about 30% of the speed of the bare metal.
Someone needs to muzzle those salesdroids before they do some real harm. Oh well, maybe we can solve it with much faster hardware.... Or maybe not.
Actually I have no problem what so ever doing a 'sh*tload' of I/O in a POWERVM environment.
Entry number 3 on this list:
http://www.ideasinternational.com/Benchmark-Top-Ten/SPC-1-SPC-1-E
Here is so the exec summery:
http://www.storageperformance.org/benchmark_results_files/SPC-1/IBM/A00083_IBM_Power-595_with_PowerVM_SSDs/a00083_IBM_Power595-PowerVM-SSDs_executive-summary.pdf
Sure there is an overhead, but the whole idea of virtualization is increasing utilization. And virtualized workloads often run faster than native ones.. why cause they can utilize resources that otherwise would have dedicated to other virtual machines. It's actually quite simple.. if you understand the concept.
// Jesper
"...And virtualized workloads often run faster than native ones.."
Could you please explain this again?
I mean, you explained that IBM can virtualize heavy I/O without problems. And you also explained that virtualized work loads often run faster than native ones. Your comments sounds a bit strange to me. Can you explain it a bit further?
It's virtualized versus partitioned/stand alone.
If you have One _partitioned_ server where you have 4 virtual machines each have:
4 CPU cores.
2xSAN ports running at 4Gbit.
2x 1Gbit ports running a 1Gbit
32 GB of memory.
Then each partion will only be able to have the resources allocated to it.
Hence 4 CPU cores of processing power.
8GBit of SAN bandwidth
2Gbit of Network bandwidth and
32 Gigabyte of Memory.
If you on the other hand virtualize things. in the following way: (Using POWERVM terms. Feel free to translate into other virtualization products notations).
8 Virtual CPU cores
2 virtual HBA's
2 Virtual network cards.
40 GB of virtual memory.
Now the physical ressources from before are virtualized, hence beneth to serve the virtual machines you normally will have.
16 physical CPU cores.
2x 4x 1 Gbit Etherchannels (8 Gbit)
2x4x4 Gbit SAN bandwidth (32 Gbit SAN bandwidth)
128 GB of physical memory.
Hence you potential have
twice the processing power 8 Virtual cores versus 4 physical
Four times the network bandwidth.
Four times the SAN bandwidth.
25% extra memory.
It's very simple you let your active virtual machine use resources that other virtual machines aren't really using. Hence driving up utilization of the physical ressources.
Simple, and this is something that many people do on many different virtualization platforms every day.
// Jesper
We read here of "virtualisation" every few days.
Given its minor position, eclipsed by numerous other branches of IT, many more relevant and almost all more interesting, you have to ask why.
Gravity is important, too, but I don't benefit from reading about it several times a week.