back to article Server virtualization – what could possibly go wrong?

Indisputably, server virtualization has a lot going for it. The challenge does not lie in its faults, given that no technology is perfect. Rather, when we researched this topic we found that the level of competence and/or experience around virtualization is not particularly high. So what could go wrong? Top of the list - given …

COMMENTS

This topic is closed for new posts.
Thumb Up

my 2c...

Plenty of situations where a dedicated machine still wins (mostly for high performance applications), but virtualising is almost becoming a matter of course for us here.

On patch management, the ability to snapshot a machine before patching/upgrading it is pretty valuable. It's the ultimate rollback - as we all know only too well, sometimes backing the patch out just plain doesn't work. Sending the whole machine back in time (if applicable) can help with this, and it's another handy tool in the box.

The isolation virtualisation brings is also pretty handy: of Dept A not being able to upgrade their system at the weekend because it requires a reboot, but Dept B run their batch jobs at the weekend so can't have the system interrupted. Where these two systems would perhaps require two physical machines, they can now be consolidated onto one.

We have systems where vendors won't let us apply Service Packs to the OS because they know their software has a bug against it, so holding back a separate VM for one upgrade is now possible, rather than risking the whole physical which perhaps might run a few systems.

Abstraction is superb - the knowledge that I can take a couple of VHDs over to any other HyperV/Virtual Server/Vmware etc host and just fire them up in a DR scenario is smart. It's quicker to configure and fire up a VHD than it is to install a bare-metal OS, install Backup Exec, begin the restore etc etc etc...

0
0

Virtual Security

I'm glad that you're going to look at security, because it is often overlooked in the rush to virtualise; while the points that you've made about 'virtual server sprawl' highlights that there are risks, no least of which is the potential for forgotten test applications lurking unpatched and unattended in the virtual environment. Additionally, having a coherent and manageable way of managing the real and virtual environments is critical. We have the good fortune of working with Stonesoft who have developed technologies that secure real and virtual environments from a single, centralised management console, and we are very aware of how unusual their offering is.

dp

0
0
Thumb Up

Can't think of a decent title.

This is quite interesting. One of the biggest problems I have found with virtualisation, both at large corporate and smaller organisations is skill level.

Some small shops with IT teams fall for putting all their eggs in one basket by having only one guy trained in the underlying technologies. This is bad practice in any case, but with virtualisation, probably more so. In a physical world, you would expect a server support tech to understand the hardware and OS an app resides on - in virtual environments, server support techs looking after VMs should also have at least a basic knowledge of the hypervisor layer between the tin and the VM.

Equally, I've seen large shops send a few people on a training course and expect them to design, implement and migrate onto a virtualisation platform without any prior experience. Trouble is, that lack of experience shows itself by complaints about lack of performance, reliability, inability to use features etc., resulting in virtualisation gaining a bad reputation within a company. Sometimes, it really is worth spending on an experienced third party outfit to set it up.

Patch management is relatively straight forward, both for guest OS and host virtualisation, so this isn't such a big issue.

Licensing is a big can of worms - Be it operating systems or applications. Server OS are fairly straight forward (at least since MS released Hyper-V and getting civilised with VMware). Apps are more complicated, particularly where OEM licenses are concerned - if you P2V a server with an OEM'd software package, watch the smallprint, particularly if the app is licensed per CPU or per core (Oracle for example). There are all sorts of questions - for example, how does vMotion/rapid migration impact licensing - do you have to license all servers in a cluster that the app may run on?

Virtualisation can simplify management, but only if it is designed and implemented properly, both technically and procedurally at the outset - but I agree with the point made about integrating with what's already there - virtualisation is a big enough upheaval on it's own.

0
0
Silver badge

security/bugs thought

first off, thanks to the people who answered my last questions, pointing out that there was more than consolidation to consider.

However I've been concerned with possible interactions of VM behaviour with software requirements, esp. databases in my cases. They must flush stuff to disk in the right order and that must be guaranteed by hardware (not doing so leaves a window of possible data corruption, which is very small if running on well-managed hardware but it's still there, and not all hardware is well managed - I think we've all seen critical data on servers shoved into cupboards with no UPS and/or no firewall).

I finally found a statement on the VMWare site which said their bare-metal hypervisor would maintain exactly the same disk characteristics (can't remember the exact phrase) so at last I had an answer - but did anyone else even realise there was a question? I don't think so, and the question as to these characteristics on non-bare-metal hypervisors is still open (anyone running a DB like this? I'm curious). People are buying into this without being critical enough. I'm not saying VMWare is inadequate, just that it's early days and people need to be less accepting.

Which brings me onto another question regarding actual bugs. This is the third time I've posted this: <http://communities.vmware.com/message/967522> "WARNING: VM bug & SCSI disks causes file data corruption (SQL2005)!" (this is reported in workstation, not server. It's not clear whether it's been resolved).

A technology would have to be very, very mature before I put critical stuff on it. VMWare is damn good but I don't know if we're at that stage yet.

An aside. A respondent mentioned that exchange & sql server on the same machine would fight for memory. SQL server's memory is trivially controllable via gui. This set me off wondering if just getting familiar with their packages would lift some of the problems people are having. I've seen so many headaches because people just didn't know what their stuff could do.

0
0
Silver badge

eggs and baskets

First of all, virtual machines are nothing new. They've been the main, if not only way that mainframes have been managed for years, or decades even. While it's very easily just to click a new VM into existence, the mainframe world has never had any major problems with VM proliferation. This could be because the world opf mainframes is one of slow, considered administration based on a stable systems architecture. The mainframe world also comes with the tools to manage and document VMs, to ensure everyone can see exactly what's going on.

The main danger posed by virtualising everything is not understanding the risks. Running 10 VMs on one host means that if (when?) that single piece of hardware blows a fuse then you haven't just lost 1 service, you've lost a whole bunch - so the hardware reliability drops by a factor of the number of VMs each box is hosting.This is not so much an issue for mainframes, where a hardware fault is quite rare - but for a commodity level server, built down to a price-point the chances of the lights going out are higher and not well understood.

It's not just the physical hardware either. Consider other shared components such as the network. If the cleaner trips over the 1000Bt that's carrying all the traffic from all the VMs, then they too all fail at once. Same goes for storage connections too. All the reliability risks get multiplied.

Talking of which: a secondary risk, that's been in play for a long time now is an effect we see on storage arrays - people (managers, mainly) having no appreciation of how the users of shared resources interact with each other. All too often the link between allocated resources and physical hardware is too abstract, and consequently is viewed as simply a bottomless pit. I've seen situations where a production database slows down terribly for no apparent reason. After a *lot* of investigation it turns out to be because another business unit in another building has mirrored their backup staging area onto someone's production spindles. All completely different volumes, of course. However no-one made the connection down at the metal level. I can see much more of this sort of problem occurring in cloud-computing land, where no-one really knows which physical resources are being used by which production systems.

0
0

@Pete 2

They're all good reasons Pete, and to be honest there were quite a few of them in existance at my current place of work; however propper planning, management and implementation negate the vast majority of the problems raised.

Network and storage connections.... On a critical box I'd always used teamed NIC's anyway, on our ESX servers we have 4x1Gb pipes bonded with LACP with fully managed and carefully routed cables to a redundant core switch. Using a FC SAN I'd never put bare fibre onto a server. Again you'd have redundant paths (2 HBA's pointing to 2 FC switches in a mesh config), use armored fibre and ensure it's wrapped in a protective shielding.

Storage is interesting, but again it's all down to planning. If you have your DAE's partitioned into multiple LUN's using various RAID levels as appropiate then you at least know which LUN is being used by which VM guest, and can easily see what other VM's are using it as well. That would be the same as in a physical environment if you are using a SAN... just the nature of the beast. Propper planning, so you know what resouce your likely to need and not overloading the LUN's prevents these issues. Virtual or not.

Although I can see exactly where your coming from. We run our Citrix farm within VMWare which spans three server rooms across a 6 mile square area. (Factory) Should a user have a problem he could be on anyone of the 6 Citrix Servers, which in turn could be on any VM Host. And of course without looking all I know is that the physical disk is on one of 6 DAE's!!! (And I'm now implementing HA on our firewall and leased lines which means "virtualising the firewall"...!!!)

Putting all your eggs in one basket isn't a good idea normally, however if you can mitigate the risk whilst keeping the benefits (consoldation, management and cost) then it seems silly not to take that route.

0
0
Silver badge
Flame

Don't see the point of virtualisation

At least not on unix systems where a properly set up app or daemon should not interfere with any others running on the same OS so why bother with virtualisation? Why duplicate and run a new OS instance just to run a new app instance FFS? Emperors new clothes or what. Ok with Windows which makes a dogs dinner of its filesystem and registry installing just about anything but proper OS's should not require a 1-1 OS to app relationship. I thought we'd left that behind with DOS.

0
0
Troll

IBM Power6

Still has the best hypervisor. I am a big linux cheerleader, but still, if you have limitless cash, or an "arrangement" with big blue, vmware kinda sucks. more koolaid? no thanks, full up.

0
0
Linux

Used a VM server for about 6 years

It's been based entirely on open source Linux host and Debian guest software and it has been rock solid for this period. This is a much cheaper way to rent a hosted server than using a physical machine. As the entire software stack is open source, licensing isn't an issue. As far as patch management is concerned, the upstream server host ISP (Bytemark) which owns the hardware gives a choice of custom built guest kernels to users such as myself, and on the very rare occasions when a security patch has to go into one of these due to remote exploit potential, guest users are informed in advance when a reboot will be required. I apply patch updates and routine upgrades to the guest non-kernel OS and applications myself using apt-get which automates the whole business pretty well.

This approach brings the cost of operating Internet servers down to similar prices to high end mobile phone contracts. I'm a very satisfied user.

0
0
Bronze badge

@boltar

There are plenty of reasons why you might to choose to virtualise UNIX (or any other OS) rather than co-host applications. Firstly there is the issue of housekeeping and patch management - anybody who has worked on a co-hosted environment in a service-critical environment will know about the problems of co-ordinating outages for things such as patch management, introducing OS upgrades or some configuration changes that can only be done by bouncing the machine. Yes, there are times when the Hyperviser needs such treatment, but with the ability to move VMs dynamically, then downtime for that can be reduced.

Then there is the issue of isolating problems. Unix is actually not very good at that - a badly behaved application can bring the whole machine down. Anything from filling up swap, to forking to many processes can, even if it doesn't crash the machine, bring throughput to a virtual crawl. Anybody who knows IBM mainframes will know that there are far tighter controls workload management, but that's a far more rigid and less fluid environment. UNIX, Linux and so on are simply not like that - it's a strength, and it's a weakness at the same time. Doing your workload management at the hypervisor level can allow you to greatly limit the impact of badly behaved applications. Then there is the convenience fact of facilitating consolidation - yes, you might consolidated a dozen Unix apps onto a single co-hosted environment, but then you've got to sort out all those version and library differences, the clashing naming standards, the shared configuration files, the kernel settings. It's often more trouble than it is worth.

Also. virtual machines work particularly well in development and test environment. VMs can be bounced, libraries changed and the like without impacting everybody else.

Now this is not to say that co-hosting doesn't make a lot of sense. However, I'm much less convinced about co-hosting unrelated applications. For larger organisations it often makes sense to develop consolidation strategies that allow you to present "appliance-like" services. It's perfectly feasible to produce farms for co-hosting databases, J2EE environments, web-services. In that case you have virtualisation at a different level - that of software services. It's much more efficient in hardware utlisation terms than having a VM per application instance. You can then have an environment which is optimised for running a given type of workload.

Eventually is seems likely that everything will be virtualised by default - look at the mainframe arena. However, that's not instead of co-hosting, it's as well as.

As for my main problems with (machine) virtualisation. Firstly there's config management, especially insofar as it affects software licensing, management, performance and capacity planning. If you are going to move your apps all over an ESX farm you had better have a way of dealing with all those issues (and the software licensing one can really bite you in the backside - there are plenty of companies out there that will seek to optimise their revenue through their licensing models that con't recognise the reality of virtulaisation). Then there is the support problem - I've lost count of the number of suppliers that don't support virtualised environments. Some of it is FUD, some of it is real. Then there is the issue that machine virtualisation can be inefficient and ingrain bad practice. One thing that VMs do is chew up memory and disk space as every OS carries a considerable overhead of both. One of the major problems is that none of the OS's that you are likely to run on VMWare and the like will share code pages. For those that have used CMS on VM, that was specifically engineered so that different instances of the guest OS would share code pages.Not much chance of that with Windows or Linux (unless things have change, IBM gave up on doing the same with Linux under VM, but if it were possible it would save huge amounts of memory).

0
0
Thumb Up

Updates of the virtual o/s

We use multiple redundant ESX machines with SAN backends. The biggest challenge we see is updating the ESX o/s and SAN firmware. Especially the SAN's - we setup an openfiler box just so we can update the SAN's. Factor: time!

Ian.

0
0
Gold badge

@boltar

@boltar, I agree. I think VMs are much more exciting for windows shops (due to Windows architectural flaws essentially). But there are cases where perhaps different divisions within the co want "their own" machine where VMs could be useful. Although, chroot jails, and other kernel-level variants of this, make VMs not so necessary even in this case.

The nice case would be if there's a failover setup on it -- so a machine acting up, the VMs automatically transfer. I think this is not commonly set up though outside the mainframe world (and perhaps Suns).

0
0
Thumb Up

VM's

Ulrich Dreppers memory White paper explains clearly why virtualization sucks from a performance viewpoint.

Also tuning in some VPS images can he difficult as familar kernal dependent utils like vmstat don't work being replaced by /proc/user_beancounters summary.

Still there's no way I could afford the capacity i get from my vps hosting deal if I had to buy/manage the physical box.

A weakness or exploit against the virtualization manager can lead to large numbers of logical services (or hosts) being affected- wasn't there a report here recently about this?

Being able to re-image a VM and snapshot is very convenient though and virtualising the os is much cleaner from an end user perspective than solaris / bsd jails.

0
0

This post has been deleted by its author

Anonymous Coward

LPAR Sprawl on POWER

We have seen a lot of LPAR sprawl for our POWER-based systems. We migrated 30+ Sun servers to three IBM POWER5 boxen running 30 AIX LPARs, and in just a few years we have grown to over 100 AIX LPARs runnng on three POWER5 and two POWER6 boxen. We could have never afforded to do that many LPARs if we ran only one OS image per box.

0
0
Thumb Up

Patch this

I also disagree with the idea that multiple VMs makes for hassle patching - if I'm going to break bind9 and postgresql updates on the same day then I'd far rather have them affecting one "machine" each at a time, so I have multiple VMs for my own little network here, let alone at work. (Note, as long as there's network, you can migrate a guest from one machine to another using virtualbox or vmware, as well.)

0
0
This topic is closed for new posts.

Forums