back to article Think your VMware snapshots are all good? Guess again if you're on Windows Server 2019

A compatibility issue between VMware's ESXi hypervisor and Windows Server 2019 will leave some customers unable to safely snapshot their virtual machines. A Register reader tipped us off this week that the newest edition of Windows Server is causing some admins to encounter show-stopping errors when making snapshots of their …

  1. From the States

    Bad link?

    The link to the workaround article doesn't work.

    1. Coen Dijkgraaf
      FAIL

      Re: Bad link?

      Yes, the link on ElReg has a space in it that should not be there.

      Correct link is https://kb.vmware.com/s/article/60395

      1. diodesign (Written by Reg staff) Silver badge

        Re: Re: Bad link?

        Thanks - it's now fixed. Don't forget to email corrections@theregister.co.uk if you spot anything wrong so we can fix it right away - we may not have time to read comments until hours after a piece is published.

        C.

    2. Ken Moorhouse Silver badge

      Re: Bad link?

      The link to the workaround article wasn't backed up properly.

      1. A.P. Veening Silver badge

        Re: Bad link?

        No, it was spaced out ;)

        1. chivo243 Silver badge
          Coat

          Re: Bad link?

          Like Candy and Ronnie?

  2. hellwig Silver badge

    Obligatory Ignorant Comment

    So, what are the use cases for this sort of feature? Are systems these days so overly complex they have no mechanism to suspend themselves "safely"? Reminds me of the old email server no one is allowed to reset because no one knows if it will come back up or not.

    Using a virtual machine should make disaster recovery easier as it's not hardware dependent, but now the software itself is so touchy it has to be loaded to the exact same machine state?

    This seems like the opposite of a recovery plan.

    1. rcxb Bronze badge

      Re: Obligatory Ignorant Comment

      Are systems these days so overly complex they have no mechanism to suspend themselves "safely"?

      It's a single check-mark to get consistent data from the entire VM with only a fraction of a second of slowdown during a snapshot, rather than having to do all kinds of application-specific commands to ensure they're all good.

      The alternative (and I would think a passable workaround) is to just snapshot the VM's memory as well, so it comes back up and running in exactly the same state, with all that dirty data still in memory.

  3. This post has been deleted by its author

    1. Anonymous Coward
      Anonymous Coward

      Re: Backups

      No one said they were?

    2. diodesign (Written by Reg staff) Silver badge

      Re: Backups

      Yeah yeah yeah. Backups was mentioned just twice and just to avoid every sentence featuring the word snapshot. We like to vary the language a little to make it interesting and less monotonous.

      But since we're being pedanted to death on this, we'll just kill off 'backup'.

      C.

    3. Throatwarbler Mangrove Silver badge
      Headmaster

      Re: Backups

      PROTIP: Many VM backup solutions use snapshots in conjunction with VM quiescing to provide an application-consistent backup.

      1. Anonymous Coward
        Anonymous Coward

        Re: Backups

        PROTIP: just use Hyper-V for Server 2019

        1. Anonymous Coward
          Anonymous Coward

          Re: Backups

          PROTIP: Use as little Microsoft Technology as possible.

          Microsoft appears to have loved to deliberately introduce incompatibilities into their products at the expense of competitors. They've done this for decades and I've experienced it firsthand with Novell and Windows. A constant battle to work around the new road blocks that they put into competitors way, accidentally of course, it's only because they don't have time to test other vendors flawed solutions.

        2. Anonymous Coward
          Anonymous Coward

          Re: Backups

          That'll be the problem then. Hyper-V features bu^Wmessing with the expected behaviour.

          Option 1 remove all the Hyper-V features that are in the way

          Option 2 provide a script that does work so vmware can call the dammed thing.

          1. DavidRa

            Re: Backups

            It's called VSS, but it's new technology (only available since Windows 2003) so I'm not surprised some vendors haven't gotten around to fully supporting it yet.

            Normally I'd expect installing VMWare Tools to provide the conduit between "host wants an application-consistent snapshot" and "call VSS function to quiesce IO properly". It's probably less than two hundred lines of code including proper error checking (I say 200 because I expect the bare call is probably ... 5).

    4. Nate Amsden

      Re: Backups

      Sort of a misleading post..

      The snapshots are used as part of the backup process to have a consistent point in time to get the data. In your link specifically says "This is because the snapshot is used as part of the data movement process to a backup file or a replicated VM. "

      At the end of the day you have to determine what you are trying to protect against, and then devise a backup strategy if possible to protect from that.

      For my linux VMs (of which I have around 800 of them in production on vSphere), I don't do any VM level backups, just backup the data that we need(at present 99% of it via NFS to HP StoreOnce). Actually I've never needed VM-level backups as I have always felt that is sort of wasteful especially if you are backing up a bunch of systems that are fairly identical to each other in the case of web servers etc. MySQL servers have custom scripts that use Percona xtrabackup to export the data safely to another storage system.

      Snapshots absolutely can be backups (short term generally). I rarely use VM level snapshots(and 99% of the time when I do I power the VM off first to make a consistent snapshot faster). As can, gasp, RAID be a backup (protects against disk failure). Storage snapshot (especially file storage) are great for restoring files that were lost accidentally. That is a backup, I mean especially for those time windows where some data could be created, and then destroyed between major backup windows. To have rotating snapshots happening every 5 minutes for X hours, every hour for X days etc.

      Some folks don't see a backup as a backup unless it is distributed off site(sometimes to more than one site) and at least semi regularly tested. That certainly qualifies as a very good backup.

      But very few have the resources and/or budget to commit to that level of assurance (certainly none that I have worked for nor any that my immediate friends have worked for). I have been involved with several near data disasters caused by software and hardware failures, many of which involved more than 24-48 hours of downtime. In every case to-date at the end of the day the companies opted not to invest significantly more(either in software/hardware or in staff time) to make the backups more robust. In most cases there was some data loss as a result of the failures, though never complete data loss.

      I have to believe that many of the folks touting extremely robust backup processes that are fully tested, off site, encrypted etc etc etc are most likely dealing with a very small amount of data in a simple environment. Or are in a fortunate position to have a massive budget available for such a system. In either case I'm sure it is a tiny tiny minority of environments out there.

      Too much emphasis in my experience gets put on offsite backups, as if a nuke is going to hit the facility that has your data. Or a big flood or something. This is so incredibly rare. The likelihood of a software failure causing massive data loss (perhaps triggered by a hardware failure) by contrast is quite common.

      In nearly 20 years of working with data centers I've only been hosted with one that had a full facility outage. There was a fire in the electrical room that too the site down for I think almost 72 hours. I wasn't hosted in the facility at the time as it had a previous poor track record for power outages. But the point is even if the systems were down for 72 hours (they had generator trucks on site for several months following while they rebuilt the electrical systems), the systems weren't lost. They were down for up to 3 days (including "big" name sites like Bing travel which had no backups at the time apparently), but they came back. That is also literally the only facility I've worked with that ever had a complete power outage. Though where I have the authority I choose good facilities. Having such an outage is terrible of course but it's not a permanent loss.

      By contrast I recall an article here on El reg for a similar fire in the electrical room at another facility, I think it was Terremark at the time. They built a good facility, the article said customers never noticed any issues, and they were able to resolve the issue with the fire department with no impact whatsoever.

      1. Wellyboot Silver badge

        Re: Backups

        +1 - Just for a comment that's twice as long as the article!

        I'd upvote for it being a reasoned point of view as well if I could.

        1. Fatman Silver badge
          Thumb Up

          Re: Backups

          <quote>I'd upvote for it being a reasoned point of view as well if I could.</quote>

          Since I agree (with the 'reasoned point of view' part) with your point, I cast an up vote on your behalf.

      2. Anonymous Coward
        Anonymous Coward

        Re: Backups

        Sorry to say, but your 20 years of experience doesn't matter any longer, because modern cyber threats have completely changed the game. Welcome to 2019 - and good luck with those snapshots, once some malware gets a hacker inside your network perimeter :(

        1. Nate Amsden

          Re: Backups

          You apparently misread my post.. My comment regarding 20 years had to do with the need for offsite backups. Ransomware doesn't take down a facility it just encrypts data.

          Snapshots certainly can help recover from such an event depending on how they are used. Example is if ransomware encrypts a file share that data is easily recoverable provided you catch it before your snapshot policy starts removing the last snapshots before the ransomware hit.. I recall reading some ransomware attacking windows VSS. i should clarify when I talk about storage its about purpose built systems generally those don't run windows.

          Snapshots aren't for everything certainly but they can be a powerful tool. I just wish NAS appliances had the ability to do read write snapshots for data testing(netapp does I believe not aware of any other vendor that can)

          As for security intrusions into the network the best policy for that in my opinion is OFFLINE BACKUPS. In addition to whatever dudupe appliance or cloud backup or whatever.. storing the data where it requires physical human interaction to get to it(best example is rotating tape that is physically removed from the drive) .. make sure the intruder cannot wipe out your data because they compromised user or admin credentials.

          I remember at a previous job that had everything in a public cloud, realizing that with my admin credentials it would literally be just a few commands(probably in a bash for loop) to wipe out all data and all backups. Now think of the news articles where cloud credentials have been leaked online. So keep a copy of your backups offline if you want to protect against that kind of scenario.

      3. big_D Silver badge

        Re: Backups

        I agree with you in part. But on the other hand good backups are important and not that expensive, in context.

        Veeam, as mentioned in the article, is fairly cheap, compared to the outlay for VMWare licenses, Windows Data Center licenses and the hardware. We use a two-stage backup, the VMs are backed up to a NAS and the NAS is backed up to external drives which are swapped out daily and stored off-site. All-in-all, the backup solution probably costs less than 10% of the total cost of the VMWare infrastructure.

        At home I snapshot my VMs, copy them onto spinning rust, which in turn is synced to a NAS and backed up to Carbonite, along with all my important data.

      4. don't you hate it when you lose your account Bronze badge
        Pint

        No comment

        As you pretty much covered it

  4. O RLY
    Headmaster

    Application quiescence

    To be pedantic, application quiescence is not a feature of the hypervisor, but requires a tool within the guest's OS to commit the data residing in memory and cache to persistent storage so that a storage snapshot, such as a *-snap.vmdk file in vSphere or a snapshot on a storage array, can be used as a quick recovery tool (Application-consistent snapshot). In Windows, that tool is Volume Shadow Copy Services (VSS). The pain of trying to restore snapshots with VSS is much less than the pain of restoring without VSS.

    1. Bronek Kozicki Silver badge
      Paris Hilton

      Re: Application quiescence

      I think this is also a pre-requisite for live VM migrations, right?

      1. O RLY

        Re: Application quiescence

        Not on vSphere. VMotion and Storage vMotion do not require VSS or any other guest tools, including VMtools.

  5. Donn Bly

    I thought that issue had been addressed...

    I noticed the snapshot issue in my environment when I moved a couple of Server 2019 VM's from an ESXi 5.5 host to a 6.5 host, but the errors went away when I upgraded the VMware Tools to a newer version. I'm currently using version 10.3.10 (build 10346) on my Server 2019 VMs and no longer getting any errors on snapshots.

  6. Anonymous Coward
    Anonymous Coward

    But the real issue is...

    VMware vs Hyper-V ... no doubt this problem that significantly impacts VMware's customers is nothing more than an unfortunate mistake by MS.

  7. james7byrne

    backup products like VEEAM will not help this

    VEEAM uses an API call to quiesce the application running in the VM. If you quiesce an application via local script or VEEAM, the result will be the an inconsistent snapshot. Microsoft will need to fix Windows 2019 so applications can be properly quiesced. It is incorrect to say a third party backup application like VEEAM will fixe the problem, it won't.

    1. asdasdasdasdasd

      Re: backup products like VEEAM will not help this

      Thank you, i was baffled about how using Veeam was a workaround this issue.

      1. AndyandtheVMs

        Re: backup products like VEEAM will not help this

        To be a bit more precice. Veeam has a consistent snapshot by working directly with the OS and the consistency frameworks like VSS to bring everything in a consistent state. Then Veeam triggers a standard snapshot without quiescense (it is already consistent).

        Pre and Post script engine from Veeam do not leverage VMware Tools quiescense neither.

  8. DontFeedTheTrolls Silver badge
    Boffin

    Business Continuity Maturity - what scenarios are you attempting to protect against and can you recover from those scenarios in a timely manner.

    "Backup" is an all encompassing word that means very little these days. If you can trumpet "yes we have backups" then you're probably screwed. You need multiple options documenting options appropriate to the failure scenarios. Restoring might be one of those options.

    1. A.P. Veening Silver badge

      A backup is worthless without a proper, functioning restore.

    2. Mark 110

      Most backups haven't been thought through. But it's relatively cheap to send tapes somewhere and feel warm and cosy even though if you thought it through you are just making tapes and paying people to handle them. They don't do anything.

      I was at a FTSE top 50 savings investment org a few years ago overseeing service assurance for a data centre migration transformation. Anyway.

      Techies proposed losing tape. Made sense. Business managers got diarrea cos they thought 6 years of tape backup covered their asses.

      On investigation the app guy Jerry could remember one occasion in 20 years where he had gone yeah ok I will get a tape and have a look if what the business want is there. It wasn't.

  9. Anonymous Coward
    Anonymous Coward

    Same issue exists for Domain Controllers running on AWS EC2

    This is confirmed by both Microsoft and AWS and has been in progress of being fixed for the last year with no end in sight.

    1. stiine Silver badge

      Re: Same issue exists for Domain Controllers running on AWS EC2

      Do what I do, albeit in vmware, not aws. My backup script checks several things:

      1) is this a paired system (yes, my DCs are all paired)

      2) is this power-off-before-backup system. (yes, its the only /safe/ way to back up a DC)

      If #1 is true and if #2 is true, I set a lock and use vmware to shut down the vm prior to the backup.

      This keeps at least 1 domain controller online at all times (at each site) and even with the added shutdown/reboot delay, its still quicker than an online backup.

      With the advent of dedupe storage arrays, and the caveat that you have to build all of your VMs from the same image, the total space to do incremental backups is, for my usage, neglIgable.

      To answer another question posed above. I believe that Reduxio has the ability to revert a disk to a point in time. The issue becomes that you then have to have separate LUNs for each virtual machine.

      Also, like has been pointed out above (and before, and forever after), if you haven't successully restored from a backup, you can't consider it a backup.

  10. RLWatkins

    Remember DR DOS? Word Perfect...?

    I recall asking on this very forum, a couple of years back: Why would anyone migrate a working VM to Hyper-V?

    In keeping with a long history of motivating migration to their own products by creating pain for customers of other vendors through subtle - or sometimes blatant - forms of sabotage, it looks as if Microsoft have devised a reason for people to migrate their VMs to Hyper-V.

    It astonishes me that people tolerate this kind of thing, and have done for decades.

    1. Anonymous Coward
      Anonymous Coward

      Re: Remember DR DOS? Word Perfect...?

      You may find the reason is that a) Hyper-V is free and b) the required client-side tools are essentially built into Windows.

    2. Nyle

      Re: Remember DR DOS? Word Perfect...?

      I remember, they did this to Novell constantly as well. Their favorite trick is to offer the product for "free" by bundling it with Windows. It's Microsoft anti-competitive practice that apparently rarely falls foul of regulators/legislators.

      Then if that fails to steal enough market share, as you mention, ooops, we introduced a bug, or oooops, does that new API change that works better for us not work well for your product. Strange, well, I guess you better rewrite your product. We've already incorporated that into ours, why are you so behind our superior offering.

      1. Mark 110

        Re: Remember DR DOS? Word Perfect...?

        When I was at Unilever 10+ years ago they were hyper-v-ing their whole estate and ditching oracledb for db2

  11. Anonymous Coward
    Anonymous Coward

    whew

    so I read this, and we have a handful of 2019 servers in our vmware environment utilizing another snapshot based solution for backups. tbh, I panicked a little bit, until I read the last bit in the VMWare KBA under resolution:

    "

    OR

    Use an MBR disk layout instead of GPT while provisioning the machines.

    "

    I don't know about you guys, but all our stuff is still MBR, there's no compelling reason for us to move to GPT drives yet. (we're a relatively small shop, not super enterprise level where we need vm's with drives that measure in the +2TB range)

    1. Anonymous Coward
      Anonymous Coward

      Re: whew

      WHY!!!

      What the *** does the disc partition layout have to do with telling a process to flush everything to disc and wait a bit?

      1. stiine Silver badge

        Re: whew

        I would guess that a different part of the code handles MBR disks because they contain....you guessed it... the master boot record.

  12. Anonymous Coward
    Anonymous Coward

    Snapshots shouldn't be used as only backup for db workloads

    You shouldn't rely on a 'crash consistent' snapshots for critical db backups... Us native tools along with the snaps...

    1. rmason Silver badge

      Re: Snapshots shouldn't be used as only backup for db workloads

      snapshots in VMware aren't backups at all, not in the real sense of the word.

      You lose your VMDK, your snapshot is worthless. They're short to mid term rollback options, or little lifesavers for easily restoring something relatively simple. They aren't a backup though. you damage or lose your vmdk file and if you don't have an actual backup of *that* it's toast, snapshots or no.

      https://kb.vmware.com/s/article/1025279

      Line one is "do not use snapshots as backups"

      Remove them from your thinking completely when you think "do I have backups?".

      1. Anomalous Custard
        Pint

        Re: Snapshots shouldn't be used as only backup for db workloads

        As I can't upvote you more than once, have a drink of whatever you're having.

    2. rmason Silver badge

      Re: Snapshots shouldn't be used as only backup for db workloads

      Snapshots are not backups. Snapshots should not be used as the only backup for *anything*.

      1. Mark 110

        Re: Snapshots shouldn't be used as only backup for db workloads

        Snaps good for app servers. Bad for business data.

  13. Anonymous Coward
    Anonymous Coward

    Snapshot issue? Use an Agent based backup instead.

    You can use an Agent based backups in situations like this. For example, with Unitrends you could install their client agent into Windows 2019 and back it up at the Guest OS level. They have a way of taking that agent based backup and then converting it back to a VM, using proactive (pre-recovered ready-2-go) or reactive (instant or regular) restore options.

  14. MrBoring

    Works ok on esxi 6.0.

  15. fredesmite Bronze badge
    Mushroom

    If you use any Windows server crap

    you get what you deserve.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019