In previous research projects we’ve examined the impact of service level monitoring and management, and found positive benefits in having a range of associated SLAs in place. No surprises there of course. However, the main finding was that beyond a certain number, it didn’t matter how much ‘extra agreement’ you have: there was …
So many words
... so little actionable information.
A lot of organisations have tested virtualisation. Most have found some areas where it makes sense (such as providing test and development sandboxes). Many have discovered it's not the be-all and end-all of solving their problems.
Specifically, it does NOT deliver an easy way of increasing your resilience, or scalability. All these virtual environments still sit on the same hardware - with the same systemic points of failure, bottlenecks and performance limitations. So far as addressing "best pracitices" go, to paraphrase Dilbert's observation:
If everyone's doing it, it's no longer best practice
Totally agree: virtualise the dev and functional test servers but keep the production and performance test servers physical except possibly for non-critical or marginal apps.
Same old: virtualisation is not a silver bullet but an extra tool in the box.
SLAs mean less virtualization
You are so right about SLAs being important in a virtualized environment. Unfortunately one of my major SLAs to the business is around response time of applications. How do you enforce/control this in a virtualized environment? There are too many other influences over the performance of my application when you virtualize it. I'm not just talking about memory and CPU resources being dedicated to my app here, but also disk/SAN resources (these are heavy database applications) - nothing is available out there to guarantee these resources.
If I mix development and test VMs with my SLA-guaranteed DB VM, what's to stop the dev VM from hogging my SAN connectivity when the DB VM needs it? VMware only cares about server resources.
If I can't deliver on the SLAs then I'm out of a job - so other than the easy low hanging fruit I'm not about to vitualize "real" applications just yet.
How many nines would sir like after the decimal point?
How do you manage service levels...?
Simple, you just lie.
Anyone who's used shared services knows that the smiles and promises last only until you sign.
It's so costly when you're the little guy...
Over-provisioning in smaller organisations is not only possible, but a necessity. When working with virtualisation, you still need spare hardware in case a node goes down.
Let me run you through an example, using the infrastructure of the company I work for. We have two types of network. One at each of our production sites, and one at our head office. Head office contains all the usual centralizable things. The production sites contain servers and services that simply can’t be centralised. (No surprises here.)
Our head office manages to cram all of its services into X physical servers. (Somewhere around X*25 VMs on those X servers, but only X physical boxen are active at a time.) For this, we keep a “cold spare” copy of our fastest VM server sitting around on the rack. If something goes boom, we move the VMs over to that server. We also have over-provisioned space on the existing servers so that if a second server should fail, we could absorb the hit by spreading the VMs of the second failed server across the cluster.
Our production sites can fit everything they need onto Y physical servers. To take advantage of a little extra performance that we don’t strictly *require,* but is nice to have, we spread the load to Y*2 physical boxes. Like the head office, we keep a spare around. Again, the spare swaps in for the first failure, and in a pinch we can collapse our sites from Y*2 active physical servers into Y.
Is it ideal? No. But we can’t really afford to be hosting somewhere in the realm of 25 times the number of physical systems in our various micro-datacenters either. So virtualisation was the only option for us. We do not have “big boy” budget. Everything we do is whiteboxed, and we are running ESXi (for lack of funding to purchase VMWare’s management tools.) When we deployed our VM infrastructure, buying SANs of adequate speed (10GB iSCSI or fibrechannel) for head office and all production sites was simply not an option. (Thus we use local storage on the physical server nodes.) This means that yes, if a server goes boom we have to move everything by hand. As a small business this means up to 4 hours downtime (worst case) with an average of 45 minutes any time a physical server vomits up a stick of RAM of drops a disk.
Counting both spares and active systems, we’ve around $N worth of virtualisation server hardware. Rough math says that if we wanted to get all the VMWare management software to run our gear, we’d be asked to give VMWare north of ($N + 30K). (That doesn’t mean $30K in addition to the cost of the hardware. It means the VMWare management software would be $30K more than the cost of the hardware it would be managing!)
45 minutes of downtime every now and again, as well as the cost of a few spare chunks of hardware and a little over-provisioning is something we can live with. The costs of the software to “elegantly” solve virtualisation provisioning issues isn’t.