Buying new physical servers has always taken time and effort. Unfortunately virtualisation has managed to create the perception that the provisioning of virtual machines is quick, easy and - very unfairly - free of charge. How has this expectation changed the necessary processes when new physical servers have to be acquired? Ask …
I saw the title and read it wrongly:
Poisoning - How do you approach it?
(You don't owe me a new keyboard as it's my fault for not reading what's written.)
We've been running VMWare for about 12-18 months and the funding has usually come from one major project. As an architect you find over time new projects realise they can save money and time by using virtual servers and it's then our job to ensure that we control the sprawl and try to fund new hardware without penalising one project. It's a difficult situation that we're only just beginning to understand and develop the processes to control.
"Oh just stick it on a VM." Is a sentence we are hearing more and more from PM's and SDM's...
Just because it's virtual - doesn't mean it isn't real!
Just because you can power on a VM in 5 seconds doesn't mean a thing. The workloads they generate are real - FACT!
IT typically funds CAPEX intensive architectures like virtualisation by stealing from Peter and giving it to Paul. Management always has their pet projects and the only way to "divert" their funds is to oversize the requirement and hopefully replace some of the old crap in the DC (shh! it's a secret). As for saving OPEX to fund CAPEX - OPEX budgets are typically under the control of the ops folks - and they NEVER have enough! You can do a million TCO calcs and no ops manager is ever going to believe them or relinquish his grasp on the funds. Any idiot can spend $1m on hardware. Ops has SLA's which they get hammered on.
Until IT management and the CxO's of the world have the foresite to realise they are saving money by spending money on the latest and greatest of gear... i.e. embracing Moore's Law which is more than 3 decades old.. they will never see return on investement. Yes it's a paradox. Get with the program! Keeping a server for 5 years is RETARDED!!!
(Non developer) IT has (in my opinion) really broken into a few generalised groups.
The human cron job: Somewhere there needs to be a human cron job running around looking to do garbage collection wherever possible on old or unnecessary VMs. (S)he should also be scavenging storage off the storage arrays, and physical space in the data centers. An efficiency expert who simply repeats the same efficiency creating algorithms on a regular basis, because they are always required.
Cloudherders: Cloudherders are systems administrators, network administrators, storage administrators, virtualisation experts etc. Their job is to take the equipment they are provided and make the best possible use of it. Cloudherders also need to be capable of producing a report on a regular basis saying "we will need X amount of capacity by Y date, if current growth holds." They are the crew of the great ship datacenter; they keep her in repair and on course.
The Captain: (S)he is in charge of the whole mess, and must be able to make sense of the whole mess. (S)he must take the efficiencies found from the human cron job, the needs requirements from the cloudherders and must be high up enough up the corporate food chain to be plugged into any "big projects" coming down the pipe that will cause spikes in demand. Their job is to secure funding, design and prototype new systems, oversee the overall network design and architecture, deal with suppliers, vendors, inter- and intra- corporate management tantrums, coddle the egos of their staff and have backup plans for their back plans that are themselves backed up with back up plans. (Depending on the size of your organisation, the captain may or may not have a first mate. In smaller organisations the captain may be required to also be a cloudherder or human cron job.)
So, how to you deal with capacity issues? Well, you can’t look at it department-by-department, project-by-project any more. The corporation is a ship, and any good captain knows that issues with one area of the ship affect all the others. Capacity is now measured company wide, and must be dealt with as such. In some cases, (such as multi-corporate “clouds” or even “cloud provisioned services” bought fromt eh likes of Amazon,) capacity is something that extends even beyond the barriers of the corporation itself. Then you must start making considerations such as “fantastic, I’ve got a 100Mbit fibre pipe to my ISP, but who am I sharing a router with, what’s the backhaul from that router to the nearest C/O, and what kind of capacity constraints am I faced with trying to access my data which is now in locations A, B and C.”
Each of the roles in IT infrastructure are vital; they can’t be neglected and they can’t be marginalised. It just isn’t a “one piece of metal to a task” world anymore. (There are too many tasks, and not enough DC space or cooling.)
Your mileage will vary; the above is my personal take based entirely upon my experiences in the industry.