back to article We suck at backups. So let's not have a single point of failure any more

The more that I move away from front line systems administration and towards data centre architect the more I absolutely convinced that systems administration can cause PTSD. With the exception of a handful of clients, I am no longer the individual responsible for making the computers go. Despite this, I still wake up at least …

  1. Anonymous Coward
    Anonymous Coward

    Absolutely. A good backup product should be configurable so that old backups cannot be deleted. Of course, space is limited, so they should be able to expire at a set age, or when available capacity is depleted. On a snapshot-based system, redirect-on-write and dedup are unavoidable, so it should be easy to see if malware is busy encrypting everything, as there should be an unusually high rate of change of capacity. This assumes you're actually monitoring it.

    Fundamentally, there are so many bad designs out there. I'm not going to name any names, for reasons that should hopefully be obvious. A certain customer, responsible for a large piece of transportation infrastructure, was using a well-known virtualisation product. Many hosts (100+) were using a high availability solution which used ssh to manage failover on this virtualisation product. And each was using the same SSH key, which gave full admin access. So, given access to any of these hosts, you get access to everything. A rogue user could cause $billions in lost business.

    Of course, I pointed this out. Was anything done?

    1. Dr Who

      This hits the nail on the head. You need to spot the moment the encryption happens by seeing the change between two incremental backups. Even if all your backups are WORM and even if they're on tape, there's still the scenario where the encryption is done but the malware keeps serving up data normally with a software shim for say a month before cutting everything off and demanding its ransom. This means you've got a month's worth of useless backups. Even if your archive goes back more than a month, the data will be completely obsolete. There is malware out there that does precisely this.

      As AC says, the only solution is to make log checking your religion and spot the problem as it's happening.

      PS this one wakes me up in the middle of the night too!

      1. Tony Haines

        Regarding the scenario of encrypting malware with a lag phase - perhaps one solution would be to write out all memory along with disk.

        Then you'd (theoretically) be able to find the decryption key somewhere in that. (This has to be available somewhere, since the shim has to be able to transparently decrypt during the lag phase.)

        Although - I suppose it would itself get encrypted. Hmm, might be an awkward workaround. Either something somewhere is written unencrypted, for boot-up to work, or you could get an early warning of the issue by attempting to boot a cloned copy.

      2. Anonymous Coward
        Anonymous Coward

        > As AC says, the only solution is to make log checking your religion and spot the problem as it's happening.

        Or: frequent test restores (like every night). Fully automated. With functional verification.

    2. Loud Speaker

      Of course, space is limited, <P>

      You might need to read "Hitch-hiker's guide to the galaxy". <p>

      Alternatively, if you don't have space to store the LTO6 tapes, you probably don't need the data as much as you need a bigger office. (Or, you might want to move your data out of central London).<p>

      I have found, by experiment, you can store a lot of LTO6 tapes in the space that accommodates a single PHB.

  2. Anonymous Coward
    Anonymous Coward

    Independent backups

    The only safe option is to have multiple independent backup solutions. If I get a restore job now, the first thing to look at is shadow copies. 90%+ of restores can be done in minutes this way.

    If this isn't an option for whatever reason (malware wipes shadow copies, required data is too old, etc) my second option is from my Unitrends virtual backup appliance which writes to a NAS. This is almost as quick as shadow copies but needs a little more effort. The appliance uses a system account with the minimum VMware permissions it needs to work.

    Finally, I have backups to tape using Arcserve. This provides a 3rd level of recovery if I can't find what I need in Shadow copies or Unitrends and is the off site disaster recovery solution. Absolute worse case if malware wipes everything including the NAS or the building burns down would be the previous evenings tape backup. Some data would be lost, but it would be survivable.

    Although I could have used the same solution (eg Veeam) to do the backup to both NAS and tape, I prefer to have independent backup solutions. That way if one fails for some reason, I should still have a backup for that day.

    1. Bc1609

      Re: Independent backups

      Exactly. This is one of those problems where the solution is simple but unpalatable: regular, frequent off-site backups (no, cloud doesn't count - you have to have an airgap or some physical measure that cannot be overwritten by software). Ransomware doesn't have to be a problem: the real problem is that the solution to ransomware takes time and money.

      1. Danny 14 Silver badge

        Re: Independent backups

        Just use a Linux hypervisor with no access to the storage outside of it. Get the Linux hypervisor to perform a backup independent of the normal regime. You have to be fairly mad to get a locked down Linux hypervisor infected.

        1. Paul Crawford Silver badge

          Re: Independent backups

          And if the Linux admin's password or SSH key is leaked?

          This problem is not OS-specific, though most victims so far have been Windows users. The solution is, equally, not an OS choice (even if it helps the odds) but having some arrangement that when the admin's key is leaked it is not enough to trash everything.

          This means probably multiple keys for different areas of a system, but more importantly (in my humble view) that you have something else, something physical or fundamental to a bit of hardware design, that prevents trashing of all backups along with the primary data.

          Having different roles/accounts for backing-up separate "root/admin" is a start. But you have to start with the assumption that someone has got complete control of the victim machines and so can undo any permissions on those machines.

    2. DavidRa

      Re: Independent backups

      Allow me to scare you then.

      What if the next version of $EncryptMalware has functionality to set and change the encryption password for your backup?

      So now all your offsite tapes are encrypted with a password you don't know. Want that data back, do you?

      Jesus that even scares me. No, no forget I said it.

      1. Anonymous Coward
        Anonymous Coward

        Re: Independent backups

        Interesting idea. However, this would mean that the malware would need to get from the client machine to the backup server on the server network. While I won't say this is impossible, it should at least be very difficult on a properly secured infrastructure.

        Even if this did happen, hopefully server logs or intrusion detection would show you that something has happened on the server. If not, you would pick this up when you do your test restores (you do test your backups I hope).

  3. Adam Inistrator

    "I have seen the enemy

    ... and he is us."

    and I mean we need to view ourselves, as admins, as the true enemy, somewhat like Apple purportedly protects its users from itself or the idea of "who will protect us from the protectors?" so to speak.

  4. theOtherJT

    "Your backup software doesn't need to be have write privileges to your production storage or hypervisors"

    Damn straight it doesn't. Why on earth would it ever be allowed to do that? Why do people insist on making things complicated that really don't need to be?

    1. Androgynous Cupboard Silver badge


      While much of this conversation is beyond my pay grade, in my experience things are given write access precisely because configuring them to run with read-only access is the complicated bit. The presumption being that the system is secure, so run it as root and save yourself the bother.

      1. Nelbert Noggins

        Re: Hmm

        If that is really the excuse provided by the sysadmins, then your first step should be fire them and hire good ones on appropriate salaries.

        Permissions and security being difficult should never be a get out clause for the sysadmins

        1. Anonymous Coward
          Anonymous Coward

          Re: Hmm

          Permissions and security being difficult should never be a get out clause for the sysadmins

          There is no time to think about this. THE DIRECTOR'S EMAILS ARE NOT BEING PROPERLY FORWARDED TO HIS HOME ADDRESS! What are you doing about that?!

          1. Trevor_Pott Gold badge

            Re: Hmm

            There is no time to think about this. THE DIRECTOR'S EMAILS ARE NOT BEING PROPERLY FORWARDED TO HIS HOME ADDRESS! What are you doing about that?!


  5. DougS Silver badge

    The best way to defend against this

    Is to ask the question "if one sysadmin gets really pissed - or his family is taken hostage - could he destroy everything from production data to all backup copies?" If the answer is yes, you need to separate roles and admin access so that can't happen[*]. If no amount of role separation can accomplish that, because of dumb stuff like backup servers with write access to production data, then you have a single point of vulnerability in your architecture that needs to be resolved before you worry about human factor attacks.

    [*] Obviously this doesn't apply in small shops where the backup guy, storage guy and server guy are all the same person or such a small team that they need to back each other up over vacations etc.

    One additional thought - hyperconverged infrastructure is a great thing, but don't collapse the backup solution into it. Then it would become almost impossible to separate admin roles such that the same guy doesn't have the ability to destroy production data and backups.

    1. Alistair Silver badge

      Re: The best way to defend against this

      Is to ask the question "if one sysadmin gets really pissed - or his family is taken hostage - could he destroy everything from production data to all backup copies, or if he gets hit by a bus can we get at all of it?"

      Thus my first rule to all the managers I've had to train. "There is always a bus out there with your name on it."

      1. Doctor Syntax Silver badge

        Re: The best way to defend against this

        "There is always a bus out there with your name on it."

        Clapham Junction, is that you?

  6. All names Taken

    Tsk? Humans?

    Or should that be:

    Tsk! Humans?

    (the wise will appreciate the granularity issues touched in the tiny yet mega (?) thought example above? (or is that !)?!

    The problem with backups is: proprietary-ness, destructive unique-ness, easy ability to be there but unaccessible and, sometimes, when accessed just a lot of used disk space with nought/nowt,zero,zilch accessible info.

    Far better is an export of data accessible by well almost anything using any platform (ok, mostly accessible?)! with all of the oS stuff mirrored somewhere else?

    EDIT: principle: separate data from drosss making sure the data are always accessible from machines that are usually and not toast?

    1. Anonymous Coward
      Anonymous Coward

      Re: Tsk? Humans?

      When did amanfrommars get a sock puppet?

  7. Alan Brown Silver badge


    "if your only backups are on the same device as your production data..."

    Then they're not backups.

    Backups are two or more copies of the data held physically separately from the data being backed up (and from each other)

    A good backup system keeps a database of file hashes, allowing you to pinpoint exactly when the files changed (it can also do double duty as a IDS system) and also allows you to restore _just_ the required file without hours of faffing around running through tapes to get to it.

    Bacula is free (or not - you can buy enterprise support) and is worth the effort to learn. It beats the pants off every other free product and also off virtually everything in the sub £30,000/year market. Once running it's _far_ less hassle to maintain than Amanda or tar backups.

    I've been happily using it for a decade. The biggest hassle it gives is having to change tapes between the autoloader and the data safe for our multi-hundred-TB workplace backups. At home it's just as happy with single tape drives or portable disks.

    The _real_ problem with backups is that people only think about them after important data has been trashed. The number of times I've been presented with "XYZ disk has crashed. We need this data back" from people who refuse to keep their files in the backed-up storage is beyond belief (we backup desktops too, but these are the same twits who refuse to let us do that too, because there are administrative charges for the service)

  8. stevej.cbr

    Nightmare Scenario happened in 2011 at Oz service provider / DNS registrar

    Distribute.IT, June, 2011

    Backups were on-line on RAID. Hacker with Admin privs (presumed to be insider, no logs left), turned off backups and waited for the cycle to delete all data. Then zero'd live disks. Nobody ever charged.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019