back to article Got it taped: The business of tape-based disaster recovery

For many SMEs, tape disappeared from their landscape as a data storage choice ten or more years ago. Domestically, it exists, if at all, as a legacy item with perhaps a car stereo chewing its way through a selection of fondly regarded C-90s. Still, this lack of public visibility by no means indicates that tape has come to the …

COMMENTS

This topic is closed for new posts.
  1. WraithCadmus
    Thumb Up

    Good article

    Makes you think about your backup plans when 'Explosion' is something people recover from.

  2. Steven Jones

    The definition of "syncronous" is not just "low latency"...

    "But if you’ve got a dedicated link, transferring data from one SAN to another, the latency could be as low as a few tens of milliseconds. That would be typically referred to as a synchronous link."

    Synchronous replication is not defined by how long the latency is - that's a matter of the configuration of the replication software. Many replication protocols, whether based on SAN (including SRDF), database, logical volume etc. support both both synchronous and asynchronous replication. Asynchronous protocols often support techniques which respect the time order in which writes are performed (so the target is at least consistent),

    What the latency does determine, along with application requirements, is whether synchronous replication is viable at all. As fibre-based comms travels about 2/3rds the speed of light (around 1m round-trip time per 100km), the delays can be substantial. If you have an application that requires low latency writes (typical of many transaction systems), then your write latency might have to be measured in low single digit milliseconds. That's easily achievable using local enterprise arrays with non-volatile write-back cache (or flash of course), but it's not going to be possible if your replication target is several hundred kilometres away, let alone thousands. Once the effective write latency goes up into the 10s of milliseconds typical of replication to a remote DR site, it can kill application performance, especially if the granularity of committed applications is very fine.

    There are ways of dealing with this by having "relay" systems which have a more local short-term replication target (outside the so-called "blast radius") and then asynchronously replicating to the disaster site, but that gets hideously expensive.

    There's also another downside of synchronous replication in that it can make your production system vulnerable to failures in the replication process. If you absolutely must guarantee writes are performed to the DR site, then the main production system will fail of anything goes wrong with the production system. It is usually possible to configure replication so that processing will continue should replication fail, but you've lost your guarantee of a replica remote system.

    Having dealt with many dozens of DR systems, then the most cost effective option is usually to design applications and recovery processes so that they can tolerate a certain amount of data loss in the event of a DR failover, even if that eventually involves some manual correction. Is vastly easier (and cheaper) to design systems where a small amount of data loss is tolerated than to rely on synchronous replication.

    It's also worth noting that synchronous replication (especially at array level) will also synchronously replicate data corruptions caused by software errors. It's often desirable to replicate at logical/transactional levels rather than array level for this reason.

    Producing highly available systems with built in disaster recovery and zero data loss is non-trivial and very expensive. Done wrong, and it can make things much worse.

  3. Hairy Airey

    Ironic choice of picture at http://www.theregister.co.uk/data_centre/

    You've used a picture from the Buncefield explosion. Northgate Information Systems weren't allowed into their building (the red brick one) to retrieve their Friday tapes.

    Early Tuesday morning is a much better choice for a full backup. The reason why is left as an exercise to the reader.

    1. Anonymous Coward
      Anonymous Coward

      Re: Ironic choice of picture at http://www.theregister.co.uk/data_centre/

      Yeah, because we've all got time on a Tuesday to run full backups which are generally scheduled to run at the weekend for good reason.

      Also, you know this is a forum for discussion, not willy waving, so could you please tell us why you think Tuesday is a better time for full backups than the weekend? I'd genuinely like to know because I've worked in data protection for about 17 years and I can't see an advantage.

      1. Hairy Airey

        Re: Ironic choice of picture at http://www.theregister.co.uk/data_centre/

        First of all - if your backups are taking all weekend then you need a faster backup solution. If you are depending on incremental backups you're one broken tape away from being unable to recover data. If you can take a full backup every day and have that stored off-site that's much better, but few places can.

        The problem with the Friday backup is that it assumes your tapes will be safe until collection on Monday morning. Northgate Information Solutions got this wrong. You will lose over a week's worth of data this way (and just in salary cost that could be a huge amount).

        Early Tuesday morning means that the tape or backup device can be put in on Monday (which may mean going in on a Bank Holiday of course or putting it in on Friday - OK that negates the reason for doing it on a Tuesday but we only have four Bank Holiday Mondays per year most years) .

        If a catastrophe hits your building on Monday night you will of course lose a week's worth of backups - but that's a risk spread over 16 hours not 64 hours. Since with only one or two possible exceptions Tuesday will be a working day someone will be in to take the tape offsite. If something happens to your building the following weekend then you've lost only 4 days worth instead of potentially 8. You might find Wednesday or Thursday to be a better idea of course - but Friday is risky.

  4. Anonymous Coward
    Anonymous Coward

    Arcserve?

    Blimey. I haven't heard the expletive 'Arcserve' mentioned in years.

    When people used to use Novell Netware servers, Arcserve was the weapon of choice, usually along with an inbuilt QIC or Travan drive which used to back up the local machine. None of this backup over the network gubbins.

    When people moved to Windows, Backup Exec became the backup software of choice along with using DDS-3, D8, DLT or Ultrium/LTO-2, as people used to have so many failures on Arcserve when trying to restore. (I'm talking V6 here - restoring exchange mailboxes on Arcserve with Brick Level backups never worked. Ever. )

    Nowadays, many corporates and hosting companies with a decent sized library (or more than one decent sized library in our case - have a look at an IBM TS3500 for example) tend to use IBM's Tivoli Storage Manager or Commvault Simpana and do it all centrally.

    As for Synchronous replication - you only have to read the story about the high street bank, the choice not to buy the correct SAN management software licence (£500) but writing their own to save cash, the extending a LUN which corrupted, causing an issue with the currency system, which handily replicated to the DR location and ended up (at the end of the day) with a £2bn charge hidden away in their annual report. Ouch.

    1. Captain Scarlet Silver badge

      Re: Arcserve?

      More than likely stuck on it.

      We used to use it and actually when it worked it was actually fairly good software. Although after running on its inbuilt database for a while that drove me insane when it randomly decided to remove all our scheduled jobs, was fine when on a sql backend.

  5. Ellis Birt 1

    Northgate's DR did not go without its hiccups. The biggest problem was getting the communications links (leased lines from customer premeses) transferred from Hemel to SunGard in Heathrow.

    The communications link suppliers decided they would continue to work at their usual snail's pace, taking days to act.

  6. Pascal Monett Silver badge

    Very impressive article, on an indeed impressive subject

    Much less impressed by the fact that there are administrators in India. Actually, I can't believe that a company certified to handle backup data for government organizations and high-level private companies has half of its tape management service halfway across the world, in the hands of people it hardly knows.

  7. Anonymous Coward
    Anonymous Coward

    Now, Minus the "Security" Theatre

    If they always encrypted the tapes at the client side using standard crypto such as AES and Blowfish, they could simply drop the tapes into the next post office box. And no, don't tell me the "NSA-GCHQ can break everything" crap. That's irrelevant as they will covertly access (by wire or by female eyes) your data center anyway, if they want. And, it probably is untrue that they can break AES.

    Or they could use any crap parcel service from UPS to DHL. Or any taxi driver from Pakistan. And save tons of money.

    Equally, the high security stuff in the storage site could be massively relaxed if two tapes were used instead of one. If one site has indeed been compromised by a nastyboy, there will always be a copy in the other site. And if nastyboy takes a few tapes home with him, he will see only the gibberish of AES ciphertext.

    But yeah, that would be cold-steel, boring rationality of the Teutonic sort.

This topic is closed for new posts.

Other stories you might like