back to article Back up all you like - but can you resuscitate your data after a flood?

When it comes to backups two sayings are worth keeping in mind: "if your data doesn't exist in at least two places, it doesn't exist" and "a backup whose restore process has not been tested is no backup at all”. There is nothing like a natural disaster affecting one of your live locations to test your procedures. I have just …

COMMENTS

This topic is closed for new posts.

Page:

  1. Daniel B.
    Boffin

    I feel your pain

    Ever-changing defaults on config files have been a headache precisely because they hit me when I migrate stuff to new boxes. Incidentally, the first time I got hit with something like this was with PHP, so I see they have marched on with the neverending changing of default settings.

    'Tis been 3 years since I last experienced a test switchover to the DR system, and that was at a former employer. At least the systems I managed worked fine, though the DR site was heavily underpowered. Hopefully they'll never need to use it, as everything does run but much, much slower.

    By the way, I wouldn't quite spend the budget on cloudy backups; what that particular employer did was to have the DR stuff in a DR-specialized facility. They even had an Ops Center that could be used by the operational team for both testing and actual work if the DR plan had to be executed. So while the company didn't own the DR facilities, they were there for the using. Much better than relying on 'the cloud'...

  2. John Smith 19 Gold badge
    Thumb Up

    Unfortunate. Something to add to the new customer checklist.

    But otherwise the system got back up. Lesson learned.

    Which is not to say that as the checklist gets longer it can be tough to keep track of it all.

    But I agree regular backups without regular restore tests is just voodoo IT. Common sense when you think about it but surprisingly uncommon IRL.

  3. Trevor_Pott Gold badge

    DR plans

    Seriously, it isn't just testing the DR plans...it's testing them with some regularity. One bloke up thataway made mention that even a minor change can invalidate a DR plan.

    Like "yum update", perhaps?

    Security says update every month, at a minimum. Do you have time/money/etc to test your DR plans for every single change every month? If so...I want to work where you work.

    1. Anonymous Coward
      Anonymous Coward

      Re: DR plans

      "Do you have time/money/etc to test your DR plans for every single change every month? "

      Do your clients have the financial reserves to survive the immediate and ongoing impact of a DR plan that ends up not quite working as quickly as was hoped, because it hasn't been tested frequently enough?

      If they don't have the financial reserves, they're almost certainly toast once Bad Things happen.

      If they don't test frequently enough, they may well be toast if Bad Things happen.

      Shit happens. Sometimes it's important to understand which bits matter, and require investment upfront and on an ongoing basis.

      1. Trevor_Pott Gold badge

        Re: DR plans

        You are absolutely correct. In fact, I think I've written the exact same thing in about a dozen different ways on this very site. Unfortunately, nerds don't control the business.

        Or fortunately? It depends on your outlook. Nerds would spend a virtually unlimited amount of money on things, restrict changes to rigid procedures that had long time horizons and generally play things incredibly paranoid and "safe." This would result in an unbeatable network, but a massive money sink and virtually zero agility. At large enough scale you could provide agility - sort of - but certainly not in the SME space. So the owners of the business make choices and they take risks. "Continue operating today" versus "prevent a risk that may not happen." There isn't always money for both.

        What really gets me is the armchair quarterbacks that seem to think that any systems administrator or contractor on the planet has the ability to force their clients/employers/etc to spend money and make the choices that the armchair quaterback would make.

        Of course, when the Anonymous Coward knows only 10% of the story, that isn't a problem, because it's obvious that everyone should do everything according to the most paranoid possible design costing the maximum amount of money using the best possible equipment and all of the relevant whitepapers. The part where doing that would bankrupt most SMEs is irrelevant. Nerds believe in IT over all things.

        Forget the people, forget cashflow; the money is always (magically) there, it is just that business owners are withholding it to fund their massage chair. Salaries of staff don't need to be paid; you need to hire more IT guys. The ability of sales, marketing etc to generate revenue is irrelevant, all that matters is that they cannot possibly affect the system stability and that the data (generated by what? Why?) is secure.

        So yeah; shit happens, and in a perfect world you'd get an up front investment from them to prevent issues and solve potential issues. In the real world, however, things get messy. Oftentimes they simply don't have the money, can't obtain it and/or aren't willing to do things like mortgage their own house to cover a remote possibility event.

        Other times, they are unwilling to make the investment and there's nothing you can do. It's your job as a sysadmin to do the best you can with what you have. You make your recommendations, you accept the choices the client makes and you help them as best you can.

  4. Pete 2 Silver badge

    Never forget the personnel angle

    > two sayings are worth keeping in mind:

    There's a third. Restores are useless unless you have the staff available to apply them

    All the talk about backups, restores, DR, high-availability focuses on the technical aspect and never seems to address the issue regarding people. There's little point having a full tested recovery plan, or backups that you *know* you [ well: someone ] will restore if needed, if that someone is either unavailable, indisposed, sacked or chooses not to do it (Yeah, go ahead: fire me. How will that get your system back up and running?)

    Whether it's something as mundane as the staff canteen serving up a dodgy lunch that lays the whole IT staff low, a particularly good party that does the same but more pleasantly, a scheduling "hundred year wave" where all the players are simultaneously on holiday, off sick, on strike, on maternity leave and freshly redundant or any other unforeseen circumstance that means nobody answers the phone when the call goes out.

    Possibly the worst of all is when no-one can remember the key to all the encrypted personal data that was backed up and can be successfully restored, but for one tiny detail.

    So yes: make sure your tech is all fired up and ready to rock. But don't take for granted the person who has to make it all happen.

  5. Anonymous Coward
    Anonymous Coward

    Relying on capped data links

    The problem I see with this is that while the ISP can promise you that uncapping the link to 100 Mb will work any time you care to do it, if the disaster that takes out your primary site affects enough other customers with similar agreements they may not be able to give everything the promised uncapped speed.

    That's something I'd look at VERY carefully before making that part of a DR plan. Perhaps that was done in this case, but seeing as how the ISP would probably consider any penalties for failure to provide the full bandwidth as a "cost of doing business", I'd want some pretty some pretty stiff penalties before this arrangement let me sleep at night.

    1. Trevor_Pott Gold badge

      Re: Relying on capped data links

      If wishes were horses we'd all ride.

      What I would like from an ISP arrangement, or amounts of available bandwidth, or budget, time, storage, development cycles, applications, operating systems, coffee vendors, dispensaries of bagels and whatever else it is that runs my life has very little to do with what I get. You get what's available. Your job is to make things work as well as possible within those boundaries.

      As it is, the cost of bandwidth is mind-numbingly prohibitive. Canada: lots of cheap, shitty quality downstream bandwidth, but you'll have to toss virgins into a very rare Ebrus-class stratovolcano to get upstream that isn't utter pants.

  6. OzBob

    Kudos to Trevor

    for sharing his experience and exposing himself to the self-righteous and indignant of the world.

  7. Anonymous Coward
    Anonymous Coward

    Tiger Team

    Maybe what was missing was a "Tiger Team"; one or preferably more people, chosen for their cynical and destructive temperament, whose entire job would be to find points of failure. Certainly an expensive investment, but maybe not a luxury.

    One of the oldest and soundest rules of testing is that the people who made something are never the best people to find mistakes in it. Mother love?

    1. Trevor_Pott Gold badge

      Re: Tiger Team

      Agree entirely. And it's a fantastic argument for external audits, too. :)

  8. SirDigalot

    100mps how quaint...

    We vrep all our VMS to the DR site, Databases are log shipped

    Email uses DAG and is basically instant

    the vrep machines need a little manual jiggery pokery and are live

    we can bring our entire saas product(s) with over a TB of data between 4 separate systems online and open for business from a total obliteration of our main datacenter in less than 30 mins if there is anyone alive to do it.. (probably me since everyone else is in the office next to the datacenter... I have often considered some radical promotion prospects over the years...)

    Under 2 hours and we have the entire company working from our DR location

    unfortunately our dev dept has not yet got to grips with true geographically diverse databases so we have to use the old fashioned way until they work out how to keep all databases online and updated in more than one place.

    It could be done on a 100mps line we only had 200 when I started here however we opt towards a little more bandwidth for safety somewhere near the 1000mps, we do the same for our florida office too and that has basically nothing in the way of server infrastructure, though it could.

  9. Long John Brass

    spherical cows in a vacuum

    Some cliché's

    No battle plan survives contact with the enemy.

    Cheap, fast, reliable; Pick any two

    My favourite solution to the DR problem is to use the DR systems as part of the QA or integration testing cycle.

    That way you know that the DR rig actually works; DR then involved switching out the QA/Int DB's for the Prod copy & running a few scripts that swap any configs that need to be changed

    Actually managed to convince one client to implement the above & they even agreed to live DR test once a year. Once a year for a week or two Prod would migrate over to DR & run there for a week or so then migrate back. First time was hairy & scary ... got easier after that :)

    Old prod kit would move to DR(Qa/Int Test) then to the dev/test racks. I believe the dev systems should be slow (Keeps the Dev's honest)

Page:

This topic is closed for new posts.