Backup V. Archive
Or has that distinction vanished into the wind?
Backup is like software superglue with three sticking points: the backup software installation; its operation; and the longevity of the stored backup data vault, which can be needed for years after the last piece of data has been written to it. Let's take an imaginary product, BackNetVaultExec Pro. It is a typical enterprise …
Or has that distinction vanished into the wind?
Well played, well played.
Spawn of Satan because, well, it was a good deception.
The tool used and the concept are tow entirely different propositions.
A "BACKUP" is a secondary copy maintained to permit recovery of the system/service/data in the event of loss or failure in the primary copy. An "ARCHIVE" is part of the data life cycle, part of the primary copy.
So if your BACKUP is only there to allow recovery of a failed system, you really should never need more that 2-4 weeks of backup.
Now there are people screaming at me in horror. However it does come down to your data life cycle. If you think you may have a need to go back 6 months, that is part of the primary life cycle, and you may need an archive if you don't want to keep the primary copy on your primary service.
The ultra-cautious might want to keep a backup for more than 4 weeks "in case we get a virus or corruption or something and we need to go back to how the server was 3 months ago". But seriously, if you need to roll back more than four weeks the backup is the least of your worries - you have got a "backup" of all the transactions/changes/business operations that took place since then, don't you?
The moral of the story is - get your backup, your archive and your data life cycle concepts right, and the choice and management of the tool becomes so much easier.
Unless you are storing all of your data on NAS (and running the backup from there), how would you perform an agentless backup that doesn't use host CPU?
As you stated, the big concern is long term retention. It is not uncommon to implement a new software, let's call it Tripoli Storage Manager, and then allow all the short term data to age-off from the old NerdBackup software. Using a partitioned tape library, you could then shrink the NBU partition until it only consists of a single head and just the relevant cart slots. You could also then just maintain a license for a single client and server. Not that big of a deal.
The other option is to use a conversion utility which utilizes the API of both titles and transcodes the data.
Finally, agent-based backups are becoming more not less popular. Many titles now perform (host) CPU intensive local deduplication to reduce LAN load and storage pool use.
Reading your article I get the impression your familiarity is with non-enterprise software like Backup Exec. The real kids play with both software and hardware that is more sophisticated.
I think I'll chime in on this one, given some practical experience with this. Posting anon for obvious reasons. :)
BackupExec, for what it's worth, appears to be able to read tapes that earlier versions have written. I used a copy of BackupExec 2010 to pull indices from tapes cut back in 2003 during an audit of media that we had stored off-site for long-term storage. (long story short- regulatory requirements force us to keep at least a yearly back indefinitely; they say nothing regarding the technical ability to _read_ said data.)
In theory, I could have built a virtual server named the same as the servers that were backed up to those tapes, and restored data from them. In practice? yeah, the data was like a month-old dead fish- unfit for anything at all*, and rather smelly.
I can certainly see why one would need to classify what data needs to be backed up- We have what amounts to spool copies of our databases because the data does grow old and expire after a certain amount of time. (We still send copies off site for other reasons) Full system backups are generally only good for disaster recovery, and only for file salvage in my personal opinion.
*except to satisfy some faceless bureaucrat for regulatory reasons.
First off, disclosure: I work for one of the largest backup vendors very closely with their software design. Also disclosure, it's not Microsoft (possibly, obviously!).
As mentioned above, many people are missing the distinction between backup and archive, this is not helped by the backup companies rolling archive into their products. My personal rule of thumb is that if you need to retain above a year, it's archive.
Now for the "it's not Microsoft" part of my disclosure: Microsoft have pretty much nailed their backup product, a good UI and an excellent CLI. It basically change block tracks and the clients, while not quite agentless are fairly close to being so, sending the tracked blocks back to the backup server at various intervals. The backup server allows file level recovery or mounting of virtual hard disks. This is excellent and in my mind they could, at least, kill Backup Exec, possibly ArcServe and CommVault as well if they just expanded their support from Windows/NTFS out to include Linux. This isn't as ridiculous in idea as it sounds, although I accept that it wouldn't be easy. System Centre Hyper-V supports Linux clients, in my mind System Centre Data Protection Manager should back them up as well, they are missing a trick here.
It seems the fundamental flaw that you are talking about is using software that stores your data in a propitiatory format. Are there not tools that stick with open formats where you can recover data without problems using another vendor's software?