The road to hell, they say, is paved with good intentions, and never more so than when it comes to virtualisation. Many companies embark on virtualisation because they think it will make IT better, cheaper and faster. There is no denying that it helps initially, reducing costs through consolidating servers and making other areas …
never seen sizing be reasonably effective
I have never - ever seen sizing be reasonably effective, well I take that back I remember one application I supported that had a very basic (but very intense) workload and when we refreshed the servers we did a bunch of testing under high load to see where we could get. But even with that testing we couldn't get more than about 70% cpu usage out of the boxes (8 cores total at the time about 3 years ago) because of thread contention in the app, though it was impressive to see upwards of 3,000 requests a second being processed in a sustained manor, had not seen that level of performance before. That company had another app that wouldn't scale beyond one CPU core. So it was easier to put ESXi on some of our new hosts and run 8 copies of it (far simpler than figuring out a good way to run 8 copies on the same OS instance). Not scaling beyond 1 CPU core was not a big deal when the app first came out in the days of dual socket single core, but when we bumped to 8 cores well it was more of a sore point.
Anyways back to original point - short of that one app - I've never seen hardware sized effectively - in the vast VAST majority of cases the cpu sits idle at 10-20% with maybe spikes to 30% whether it is physical or virtual - even the CPUs of VM hosts sit idle at low levels! It's all about memory capacity (not even memory performance), cramming more of these low usage apps on the same systems - so I want more cores to run more VMs.
One developer asked me yesterday how heavily loaded one of our apps was on our VM infrastructure he was going to make a change and wasn't sure if it was going to cause them harm. I looked we have 2 VMs for each of these apps, and the VMs at the moment have 4 vCPUs on them ("just in case" - our new VM cluster is still new and we're still measuring the performance after moving out of Amazon). So I looked - guess what ? The average CPU usage on those 4 VMs was 0.25% -- not even 1%. There was some spikes to 10% when some batch jobs run but otherwise 1% cpu.
Every org I've ever worked for has never had any idea on how to size things so frequently things are sized to budget / power / rack space rather than the app. I'm not going to go out and buy a bunch of servers that have 2 CPU cores each and 8GB of ram because that's all the app can use it's a waste of power. Run more apps on the same boxes(one company I had 10+ different ruby apps running in different apache instances on the same small collection of servers).
One company I was at about a decade ago now, it was not uncommon for us to be forced to double server capacity literally over night after a major software release because the software was so bad (and they had no time and ability to perform adequate performance testing). The server counts in the early days were small. That same company spent a bunch of cash for their Itanium-based Oracle HPUX servers, only to have them hover at 80% I/O wait
Having worked for companies that develop and operate the own in house software and run it ourselves I suppose puts me in a more unique position than a company running off the shelf stuff with a fixed number of users.
Then there were the days of fighting Oracle latch contention where the experts say you can throw all the hardware at it that you can buy and it won't fix the problem. Gotta fix the queries.
The lack of resource pooling at all levels whether it is storage, memory or cpu is the most critical failing of the public clouds (especially Amazon). Some enterprise cloud players offer resource pooling but it seems limited to their enterprise agreements not on demand stuff.
At least with virtualization if you make a capacity mistake it's a lot easier to correct (up to a point at least). Which is one of the main things I love about 3PAR - I can change the data distribution on the back end at any time, I can make a mistake and fix it later without impacting the applications in any way.
I've grown more attached to the older term of 'utility computing' rather than 'cloud'. At least when it comes to infrastructure. True cloud, from a provider standpoint is very, VERY complicated. Utility computing on the other hand is really simple. I think people think too much about cloud when for the most part utility is all they need.
Re: never seen sizing be reasonably effective
I can wholeheartedly agree with you.
we have a 5 node ESX cluster, with each node consisting of a quad-socket Opteron 6174 (48 core total) and ~96 GB of ram. We have ~80 VMs running on the entire cluster at any given moment. The processors average right around 7%, but memory usage is easily averaging 65-75%. Our systems are running quite nicely, but there's a lot of wasted processor time on these nodes.
anon to protect my paycheck. :)
This man is on the ball.
Its really sad when companies get converted to the virtualization mantra but only halfway.
Small businesses are really vulnerable to this because usually they can't afford the best sysadmins to help them design a good virtualized infrastructure.
They get sold boxes and start migrating their physical servers to virtual ones (they like this, their running costs decrease and they don't need to buy as many boxes) but the wheels start coming off when hardware failures come knocking.
A customer was horrified when I explained to him that he would have had to buy at least another box with the same specs with the network infrastructure to accompany, to get his servers in a cluster. Apparently he thought that the box handled clustering by itself (!) and provided self redundancy (using faerie magic apparently). Another one was mystified that Vmware includes Vmotion which you can you configure to migrate running VMs to other nodes automatically and in case of failure but only if you pay the license. Seems like sending his plods to Vmware training was not budgeted or considered. If you don't know Vmware please go with Hyper-V. It may not have as many bells and whistles but its simple and plods adapt better to it.
Don't virtualise on the planning
I’ll declare my interest straight away; I work for a small IT firm that helps businesses virtualise their IT infrastructure. The reason for the declaration is that virtualisation is an area that good IT VARs can genuinely help businesses, because they can bring a unique item to the virtualisation party, experience.
Over the 6 years that we’ve been helping businesses virtualise we’ve seen two key reasons why some virtualisation projects fail.
1. Ineffective project planning
Virtualisation is a beautifully disruptive technology. But we’ve seen projects fail because of poor planning. Application availability, CPU usage and storage volumes and media type have been estimated on best endeavours, and have undermined the project.
2. The business case has been sold on cost savings alone
OK so everything at some point has to come down to cost savings. We even have an online virtualisation calculator that helps small businesses identify potential cost savings from virtualisation:
But those cost savings can soon get eaten up by VM sprawl. The biggest benefit of virtualisation is that it helps you build a flexible, agile infrastructure.
Good IT VARs can bring their experience of working with many other businesses to help mitigate project failure.