Oh dear ...
>> I work in banking and for obvious reasons, we have to have a zero RTO on our online financial transaction systems (you can't lose transactions, that would be really bad!)
Who didn't read the article and still doesn't understand what they are commenting on then ?
You do NOT have a zero RTO, that (as pointed out) is completely science fiction except for certain classes of problem. What you are after is a zero RPO which is something completely different. Ie, it may take you a few minutes to a few hours (depending on what happened) to recover systems to a working state - but if you are able to recover all transactions up to the moment of failure then you can have a zero RPO.
@ Steven Jones
>> Uhm - how can this be introduced without reference to Oracle's transactional inegrity or archive logging? Assuming that a backup has been taken properly, it's impossible to recover a database in the middle of a transaction without doing some bizarre things.
But not all systems use Oracle, or even an "online RDB". When designing the technical elements of your business continuity (BC) strategy (which is what this is about) it would be nice to be able to say all your apps use a well behaved RDB with full transaction and journalling capabilities (which they actually use), but back on planet earth that is not always the case.
OK, in general, the bigger the system the more likely that is the case, but I've worked with plenty of systems where data is in various forms - one in particular used an Informix C-ISAM engine. Yes you may turn your nose up and suggest something along the lines of "well if you use software like that then your deserve all you get", but at the time it was put in it was the best of the products on offer both in terms of fitting the business requirements and being affordable.
The downside of course is that neither the software, nor the underlying database (which was actually just a load of files on disk), supports transactions - which also had issues in areas other than BC. The software also had no way to tell a module to quit and rollback anything it was up to - so we did need to be careful over who was able to do what during our backup window, and we couldn't automatically terminate any sessions still in use.
But this wasn't the end of the world - it just meant that our RPO could never be anything other than "last night" (other than for special cases where we'd kick everyone out and do a data copy before some process (eg financial month end)). Our technical RTO would be "as long as the restore takes", and the business RTO would be that plus whatever time it took for people to redo their work for the day.
And yes, it did get tested for real a couple of times :-( Management would make use of the down time to get people tidying their desks and filing cabinets !
I do think there is one aspect I believe the article really missed though - and that's how to have a meaningful discussion with management about it - and the place of the technical stuff in the overall BC plan. Naturally management want a zero RPO, zero RTO, and zero cost. They also don't want to consider other vital elements of BC - such as "when the 'server room' burns down and you've recovered to some spare hardware, where are the staff going to sit since the office probably went up with the server room ?"
At my last job (no names, no pack drill - hence the AC posting) I tried to raise such issues, but management refused to accept that DR was anything other than an IT issue. One day we had a false fire alarm in the factory, and when everyone was stood outside in the cold and rain, soaked to the skin (literally for many) ... I casually mentioned to one of the directors that if this was a real fire (so we weren't going to be going back in the building today), then the BC plan would tell us where we were going to put the staff, how they were going to get home (car keys or money for bus fare in desk drawers), etc.
I think you can imagine how far that got me !