back to article The lowdown on storage and data protection

El Reg has teamed up with the Storage Networking Industry Association (SNIA) for a series of deep dive articles. Each month, the SNIA will deliver a comprehensive introduction to basic storage networking concepts. The first article explores data protection. Part1: Fundamental Concepts in Data Protection Data protection is …

COMMENTS

This topic is closed for new posts.
  1. Steven Jones

    Restoring to middle of transaction?

    " For instance: if we restore from a point in the middle of a transaction we might get into big trouble. Therefore, for backup applications you will need to first ensure the consistency of data and then backup. Obviously this reduces the number of points in time that are available for a restore."

    Uhm - how can this be introduced without reference to Oracle's transactional inegrity or archive logging? Assuming that a backup has been taken properly, it's impossible to recover a database in the middle of a transaction without doing some bizarre things. As part of the recovery process, and database worthy of the name will back out any updates that have not been committed. Also, there ought to be an archive log of transactions so that the database can be recovered to the desired point (indeed, unless you are lucky enough to be able to quiesce a database for backup purposes, the logs are an essential part of virtually any database backup regime).

    Also, I would suggest that a complete recovery of a database following a corruption is a sledgehammer technique which is only used in extreme circumstances. What is often necessary is to gain a view of a database at a given point in time and fix up the DB (or files) manually. Indeed some databases have facilities to do exactly that (e.g. Oracle FlashBack) in order to avoid full restores. Something rather creuder can be achieved with file system snapshots in order to recover lost or old versions of files short of a full restore. For many companies, rolloing back a transactional database a couple of hours before a logical corruption is simply not feasible as it will lose a whole lot of important transactions on the way.

    Maintaining data integrity on large IT systems is a complex exercise, and the old backup/restore regime is just the process of last resort.

  2. Pahhh
    WTF?

    Belts and braces

    @steven jones - yes I agree that most self respecting databases can clean themselves up. Although I have some sympathy to your comment on the sledgehammer approach but in reality most admins arent DBAs, they dont understand the contents of the application and a recovery for them is all or nothing. In all honesty, if you were looking after a 3rd party application that had a database, you would probably do the same as the schemas arent often published.

    "The easiest way to create a reliable, consistent backup of an application is to shut the application down and then backup its data. The shutdown will create a consistent state"

    Good grief, this was the type of statement I would have expect in the 90s. Nowadays there are lots of perfectly acceptable ways of getting a consistent backup, from using an applications backup API to using a snapshot system that is application aware (such as VSS on Windows).

    Even products like Symmantec's BackupExec will do a fair job. Or if you want belt and braces approach, a product like Cofio's AIMstor will give CDP combined with snapshot instances so you've got the frequent crash consistent images (ie every minutes) combined with less frequent recovery point which are fully consistent (4 times a day).

    "that is why an RTO of zero is science fiction."

    Really? Isnt this what mirroring is for? Or there are some products that do real time replication. In the event of a disaster they have the ability to achieve an instant uptime by restoring from the from the replicated image in such a way it prioritises the restore according to what is being access (bit like a HSM product). Asempra's product was an example of this.

    1. Ammaross Danan
      Boffin

      On Backups

      "In all honesty, if you were looking after a 3rd party application that had a database, you would probably do the same as the schemas arent often published."

      100% agree with you there. There are many applications that use some obscure DB system, or worse yet, use a common DB but don't tell you the password(s) to manage it. All in the name of database integrity and proprietary systems of course. This will prevent BackupExec and the like from sinking its hooks in.

      "lots of perfectly acceptable ways of getting a consistent backup, from using an applications backup API to using a snapshot system that is application aware (such as VSS on Windows)."

      I doubt VSS is Sybase-aware. Sorry. And as for backup APIs in the software, you neglected to read the author's next sentence. He mentions using the software's internal backup functions as an option.

      "AIMstor will give CDP combined with snapshot instances so you've got the frequent crash consistent images (ie every minutes) combined with less frequent recovery point which are fully consistent"

      Such CDP systems would only benefit DBs if they grab the transaction log. Then you can have the DB roll back the log to a consistant state. The CDP system isn't, in itself, magical enough to do this on its own.

      ""that is why an RTO of zero is science fiction."

      Really? Isnt this what mirroring is for? Or there are some products that do real time replication."

      Such a fail comment. In the event of a database foobar, that corruption is automagically replicated to your mirror. Same goes for corrupting (modifying/deleting) files. The only time mirroring to a hotsite is useful is for system failure. Server bursts into flames? No problem, we have a hotsite. Directory tree got deleted? You lose your zero RTO due to having to restore files.

      1. Anonymous Coward
        Anonymous Coward

        Zero RTO

        I work in banking and for obvious reasons, we have to have a zero RTO on our online financial transaction systems (you can't lose transactions, that would be really bad!) The way you do it is by having database and logs separated on different spindles (or groups of spindles) and replicate to a remote site, with at least daily snapshots and online agent driven tape backups as appropriate.

        If the logs fail, you have the database.

        If the database fails, you have the database snapshot and live logs to play back.

        If the database corrupts you restore the database from the snapshot and play the logs until the point of corruption. (This requires a skilled DBA)

        In the case of a proper disaster (ie site failure) you'll probably be able to mount both database and logs at the remote site.

        1. JL 1

          Pedant alert: not spindles but LUNs

          You are right on the architecture and most financials do the same, particularly for messaging apps. However, these are not on separate spindles but rather separate LUNs.

          Motherhood & apple pie explanation: the enterprise storage device your app runs on 'virtualises' the physical spindles by breaking them into thousands of small extents and then groups these extents into LUNs which the server uses. These LUNs may well have blocks of data on the same spindles. The same principle exists if you run on NFS from an enterprise filer.

          We've been trying to educate DBAs to stop talking about spindles for 15 years...

        2. Anonymous Coward
          Anonymous Coward

          Oh dear ...

          >> I work in banking and for obvious reasons, we have to have a zero RTO on our online financial transaction systems (you can't lose transactions, that would be really bad!)

          Who didn't read the article and still doesn't understand what they are commenting on then ?

          You do NOT have a zero RTO, that (as pointed out) is completely science fiction except for certain classes of problem. What you are after is a zero RPO which is something completely different. Ie, it may take you a few minutes to a few hours (depending on what happened) to recover systems to a working state - but if you are able to recover all transactions up to the moment of failure then you can have a zero RPO.

          @ Steven Jones

          >> Uhm - how can this be introduced without reference to Oracle's transactional inegrity or archive logging? Assuming that a backup has been taken properly, it's impossible to recover a database in the middle of a transaction without doing some bizarre things.

          But not all systems use Oracle, or even an "online RDB". When designing the technical elements of your business continuity (BC) strategy (which is what this is about) it would be nice to be able to say all your apps use a well behaved RDB with full transaction and journalling capabilities (which they actually use), but back on planet earth that is not always the case.

          OK, in general, the bigger the system the more likely that is the case, but I've worked with plenty of systems where data is in various forms - one in particular used an Informix C-ISAM engine. Yes you may turn your nose up and suggest something along the lines of "well if you use software like that then your deserve all you get", but at the time it was put in it was the best of the products on offer both in terms of fitting the business requirements and being affordable.

          The downside of course is that neither the software, nor the underlying database (which was actually just a load of files on disk), supports transactions - which also had issues in areas other than BC. The software also had no way to tell a module to quit and rollback anything it was up to - so we did need to be careful over who was able to do what during our backup window, and we couldn't automatically terminate any sessions still in use.

          But this wasn't the end of the world - it just meant that our RPO could never be anything other than "last night" (other than for special cases where we'd kick everyone out and do a data copy before some process (eg financial month end)). Our technical RTO would be "as long as the restore takes", and the business RTO would be that plus whatever time it took for people to redo their work for the day.

          And yes, it did get tested for real a couple of times :-( Management would make use of the down time to get people tidying their desks and filing cabinets !

          I do think there is one aspect I believe the article really missed though - and that's how to have a meaningful discussion with management about it - and the place of the technical stuff in the overall BC plan. Naturally management want a zero RPO, zero RTO, and zero cost. They also don't want to consider other vital elements of BC - such as "when the 'server room' burns down and you've recovered to some spare hardware, where are the staff going to sit since the office probably went up with the server room ?"

          At my last job (no names, no pack drill - hence the AC posting) I tried to raise such issues, but management refused to accept that DR was anything other than an IT issue. One day we had a false fire alarm in the factory, and when everyone was stood outside in the cold and rain, soaked to the skin (literally for many) ... I casually mentioned to one of the directors that if this was a real fire (so we weren't going to be going back in the building today), then the BC plan would tell us where we were going to put the staff, how they were going to get home (car keys or money for bus fare in desk drawers), etc.

          I think you can imagine how far that got me !

  3. Pahhh
    Flame

    @ammaross

    "Such CDP systems would only benefit DBs if they grab the transaction log. Then you can have the DB roll back the log to a consistant state. The CDP system isn't, in itself, magical enough to do this on its own."

    -- Yes the CDP system can and will capture the logs too. Yes, the roll back though will be done through the database. Its the only way I know of at least......

    "Such a fail comment. In the event of a database foobar, that corruption is automagically replicated to your mirror. Same goes for corrupting (modifying/deleting) files. The only time mirroring to a hotsite is useful is for system failure. Server bursts into flames? No problem, we have a hotsite. Directory tree got deleted? You lose your zero RTO due to having to restore files."

    --- Such a fail response..... thats what snapshots are for. Run snapshots on your mirrors (heck on your source too).

This topic is closed for new posts.

Other stories you might like