Is a copy of a file a backup? A group of of storage bloggers from EMC, Nirvanix, Ocarina and other locations in the storage blogosphere have been debating this topic and have generally agreed that it is, so long as it meets certain criteria. The background is that a backup file is traditionally a container file, holding encoded …
I'm currently 'backing up' a copy of a DVD.
Now, it's someone else's DVD but it's still a back-up to me!
Simple question. Simple answer.
Is a copy of a file a backup?
Yes. I have my entire photo collection of camera raw files backed up to Blu-Ray discs.
Not so difficult, was it.
Does a bear sh*t in the wood...?
The gist of this article is that eventually we will be forbidden to copy files, but allowed to back them up with relevant backup software. IME of backup software, most of it is rubbish, and I have always found my own backup solutions to be more reliable, share-aware and effective. What is the software world coming to?
Bzzzt! Wrong answer
A backup is defined by intent, not format or medium.
copy boot.ini e:\boot.ini may be creating a backup or it may restoring a backup to its original location or it may be part of a config dump to aid tech support troubleshooting.
Backups are strange things, consider how they were done in the 80's: "How ya gonna do it if you really don't wanna dance? Get yer backup off the wall!"
That'll give the industry experts something to ponder.
"Its creation, ageing and disposition is managed by an application, and it is in a different format from the source files"
Why ? Surely having the data in a different format makes restoring the data harder ?
Unless you're making the novice mistake of confusing backup with archive, whereby you will want the (archived) data in an open, very plain, easy to read (i.e. non-proprietary) format.
Is a backup a backup if it is in the same location as the primary copy? That's probably a whole lot more important. You backup (or copy) isn't worth anything if it's on top of your computer because it's highly unlikely that the fire/burgulars will leave the backup behind when it/they destroy your house.
What's all this 'different format' rubbish
I use rsync for backups to USB drives, so are these bozos telling me that, just because rsync makes identical copies of files in the backup media, that they aren't backup copies?
Function and Risk
These things always get complicated when you make a definition based on multiple levels of requirement.
A backup is something you can recover from - copy or otherwise including snap as long as the data is in a proper consistant state and ties to a Point in Time.
An efftective backup is one that you can recover from quickly
an efficient backup is one that consumes little resource - effort to create or restore, and the media to contain it is cost effective - so space efficient on disk or low cost per GB on Tape.
A backup is prone to risk, so to reduce risk you go down the path of copying to other media, removing from site, hiding under bed etc.
Many modern solutions are very efficint in replicating PIT copies to offsite media.
Best case is you want both - fast recovery without copying etc - so work out how to achieve the best mix you can afford.
Just because I did not buy backup software does not mean I cannot produce a backup, as backups existed before backup software.
P Lee is right. Very right
"..A backup is defined by intent, not format or medium..."
If I am going to run a new sort program on my database, I will take a copy first. That copy will stay on the same machine, but it will be a good defence against my sort program screwing my data.
Or I might copy the database over the company network to our branch office in New Zealand. That's as far away as I can get physically, and no use at all against a network virus that corrupts MySQL files.
A Backup is simply a copy that you use to recover from disaster. What that copy is, and where it's stored, depends entirely on what disaster you expect. And for any format or medium, I bet I can think of a disaster which would render your particular choice of medium useless.
So backup media and formats are entirely driven by what you think might happen to you....
What kind of nonsensical argument is this?
Is a copy the same as a backup? Of course it is! That's what a "backup" IS; a copy. Whether you move that copy off site, ASCII dump it to a line printer, or stick it up your nose, it makes no difference.
Whether it is recoverable should it need to be, well, that's up to you, but it doesn't change the meaning of the word.
What a load of bollox. Some people have way too much time on their hands!!!
Are these people perhaps taking themselves too seriously?
Surely, the only thing necessary for anything to be a "backup" is that it consists of some means to ensure that if your computer's (or server's) drive(s) crash, your house (or datacentre) burns down, or possibly even if your entire city gets nuked from orbit, it's still possible to regenerate whatever actual data you care about. So yeah, I agree that it's got to involve moving it offsite at some point. But attempting to argue that JWZ's method:
somehow isn't a "backup" because it's merely a "copy" sounds like pretension at best and idiocy at worst.
Are these people actually getting paid to have this pointless debate?
Well said. I have worked with backups that break every single definition mentioned in the article. E.G properties file backups that are saved to the same location, same file format primarily intended for use as a back up but very useful for support staff trying to replicate problems.
I have (and I'm sure many people have) used a copy of a file as a back up in an emergency. So it only becomes a backup when it's used as one ... even if that wasn't the intention at the time the copy was made.
Who does backups? Much less offsite, geographically diverse backups?
I mean, *I* do, and most Fortune 500s do ... but anyone else?
THAT said, what's the point of the discussion? I'm guessing that one or more of the big software houses is griping at the SAN folks and so-called "cloud services providers" for allowing complete system backup ... which would copy a complete, working copy of the software in question to other media (oh-no!). Never mind the fact that it wouldn't actually run from the other media ... They will probably push for legislation making it illegal to backup anything other than user generated data.
Why must data be in a different format? Why must it be in a different place? Geez, are these guys out to convolute any/all existing practices just because it is going to make them sound 'smarter' because they (in their own little worlds) found three quasi-arguments to substantiate their place in the computing cosmos?
Any method the user chooses to preserve their data, despite the 'expert opinions' of how good or bad, will still be a backup to that user.
Mine is the one with the holographic disk in the pocket
RPO & RTO
It all boils down to RPO & RTO - how much data can you afford to lose and how much downtime can you have.
Depending on those two factors a "backup" can be anything from sync replication to local snapshots for quick recovery to a copy on local disk to a tape in a library to a tape shipped offsite to a secure location 20 miles away and a variety of options between the two extremes.
There is no one way of defining what is and what is not a backup as it varies hugely by company, by system, by application, by the actual piece of data etc etc
I'd be looking for something that can automatically verify the "backup" (how do you know your original is not corrupted and you're merely copying that, removing the old 'backup' at the same time).
Not to mention, backup to me implies that you'd be restoring it EXACTLY e.g. time created/modified entries, which generally get messed about with in a copy, not to mention the copy back - which is why 90% of my files have the same creation date, as that's when I last restored it from a copy/backup when I had a failure. A proper backup solution wouldn't have done that.
There are backups and backups
The discussion seems to ignore that fact that there are lots of different levels of backup for different purposes. For example if I am working on a document I may keep backup copies of it on my hard disc in case I accidentally corrupt or delete the work file, or don't like the changes I've made and want to go back.
I may then back up the disc to an external hard drive. That protects against disc failure but not theft of the whole system or other disaster. I could also hide CD/DVD copies on site so that they would be unlikely to be found by a burglar.
Finally I may store backup media offsite, or back-up to a remote server, as protection against major disaster.
It really makes no difference, apart from processing time, whether I choose to store backup copies in original format or a compressed format such as a zip archive.
Ah glasshoppa it all depends
See Einstein regarding relativity.
Does one wish to back up a file (ok, I'll admit that "file" is not a strict definition), a bundle of files, a loose selection of files, an ordered selection of files, all of the user data for that particular user, all of the data (user, users and application software) on a particular computer or loose/ordered collection of computers including or excluding servers, ...
A backup is whatever you want it to be and seems to loosely hinge on a collection of something has been created on removed or remote media not additionally stored (although it might be) on the device that the source data exist upon.
Also that proprietary backups usually if not exclusively depend upon the original backup application being present in order to recover data from the backup otherwise the notional backup is just dead media.
A matter of perspective.
I don't use the word backup any more; when people do they get my lecture on the Four Rs of data copy usage. Because there isn't ONE reason why we take copies, or replicas, or backups, or whatever you wish to call them, there are four!
As all the smart people have recognised already, what matters isn't that a copy is made. What matters is what you want to do with it afterwards. And it ain't so simple once you get beyond protecting your email and photo albums.
So the Four Rs of data copy usage are:
* Restore. Pull back one or more historical files, tables, volumes or what have you because the current working set was deleted or corrupted.
* Recover. Restart data processing using an alternate facility due to loss of a primary.
* Repurpose. Use a primary data set for production support, report generation, system test, or system development.
* Retain. Meet regulatory and legal requirements - and broader expectations - for data archival and discovery.
The interest in each outcome should guide you as to which storage technology capabilities to buy or build. Some are surprised to learn that traditional tape backup is not the best path to achieving all four at efficient cost in one package.
I've heard the phrase "Information Lifecycle Management" (ILM) used when introducing this topic, but I avoid doing so because it's an industry term and everyone else glazes over. Even your IT colleagues will look at you funny, suspecting a dose of Gartner kool-aid. To be fair, ILM "done right" is a bigger topic, encompassing people and process as well as technology: it is comparable to ITSM, PM and the SDLC as a complete subdiscipline of ICT. It is well worth treating as a separate competency once you have some scale.
Conflict of interest and Catch 22
They just want you to buy their software.
However, if you need that software to restore your backed up files, what happens if the vendor has gone bust?
Obviously, you will need to have backed up the back-up software as you can't now buy a new copy to restore with [even assuming the current v7.3 will restore the files created with your old copy of v1.1].
That backup MUST be in its native form, or you won't be able to restore your backup software to facilitate the restoration of the backed-up files"!
So you might just as well keep everything in native form [or not bother to back anything up!]
re: Conflict of Interest
Very true, and well said.
The storage argument and backup "definition" as presented can only be about money. Otherwise it makes no sense. If you can persuade everyone that a "backup" must be accomplished with software that saves your files in a proprietary format - and that simple copies are not a backup (what a surreally stupid assertion that is) - you are locked into a company's product essentially forever.
Me, I'm sticking to mirroring my files on an external drive and weekly cloning. Why take 10 minutes to run some application's recovery routine when I can do a 10 second drag and drop?
BTW, anyone know which companies are subsidizing the storage bloggers listed?
@Adriaan Serfontein and others
Nah, the important bit is that you never use the word Backup on its own: there are 'Good Backups' and there are 'Offsite Backups' and there are 'Quick Backups' and there are 'Disaster Recovery Backups'
(And probably a few more, but you probably get the picture now)
They are all different, and they all have different criteria as a result.
So do restores: sometimes you need to restore within an hour, sometimes you have a week or two, and that is the primary definition of a 'Good Backup': one that gives you your data back when you need to, within a reasonable time frame, whatever that means to you.
So, what was the point of this article? If it was meant to be edumacational, then there was more info in the comments; if it was intended as a sales pitch, then I've just lowered my score for these vendors by a point or two; if it was intended to be entertainment, then the humour missed me completely.
a Backup is just a Copy for sure.
Back up or Copy?
Wankers the of them.
One backup to rule them all
If I'm working on a file containing a large number of data which I am manipulating, I will keep a copy of this file in case I accidentally screw it up and want to go back a step. This is quite likely to happen, and I don't want my sys admin to go into the recovery tapes. In this situation the 'site' hosting the data is Spreadsheet 1, and so the data in Spreadsheet 2 - the backup file - is 'off site', in the context of what I'm doing with the file.
If I have a collection of freeware that I want to keep in a backup data set in case my HDD dies, the main site is my computer HDD and a DVD in the same room counts as being 'off site'. For my family photographs, I want to protect them against fire in my house. In this case, the main site is my house, and so 'off site' means stored on-line or a copy of the disk kept at work. I may want to encrypt this data to protect it from prying cleaners.
If you are a spy, you want to put your data somewhere where it cannot be destroyed if the government you're spying on come to suspect you. In this sense, the live documents are in several different physical 'sites' and the off-site backup can only be a location that no-one else in the enemy country knows about. You definitely want to encrypt it... but you want it to be recoverable by your fellow spies in case you end up in a gulag.
If you're a military organisation, you need to keep your backup in a different city in case of war breaking out. Storing your military data on a backup server housed in a server farm one mile up the road isn't going to do you any good if your city gets nuked.
Finally, if you're a civilization who wants to protect their creative output against the collapse of your civilisation, the 'live' data is that which is held in a location that can be easily destroyed in times of social breakdown. In this context a backup made off-site means in a secret underground bunker protected by a dedicated team of soldiers and techs.
So, depending on the importance of the data and the relative risks difference scenarios present to a specific file, set of files, or system setup configuration, a 'backup' can mean either a copy of an active file stored on the same hard drive your working on... or it can mean an encrypted file somewhere under the Rockies that is effectively unrecoverable by the person who ceated the file and even the person or persons who use the files in everyday life. The backup process can be anything from making a copy manually by saving the file under a different name to using a data harvesting system trawling the web and collating the data into a giant partial image of the World's digital information.
Really, defining what counts as a 'backup' is the same as defining what counts as 'security'. Stoping your kid's sports kit getting lost involves sowing their name into the back of it. However, writing your name on your laptop isn't going to count as adequate security when you're on a train and need to visit the loo.
If you really want a defininition of backup in the sense of digital information I'd go with:
Backup (n) - a copy of a discrete set of digital information created in case the information held in the original copy is corrupted, deleted or lost through software / hardware failiure or malicious intreference.
Everything else is contextual.
- On the matter of shooting down Amazon delivery drones with shotguns
- Review Bring Your Own Disks: The Synology DS214 network storage box
- OHM MY GOD! Move over graphene, here comes '100% PERFECT' stanene
- IT MELTDOWN ruins Cyber Monday for RBS, Natwest customers
- Google's new cloud CRUSHES Amazon in RAM battle