Does tape have a role in cloud computing? Ask cloud evangelists that question and they sit back, purse their lips, and say, "No, of course not ... but ..." The thing is they tend to come from disk storage-biased suppliers or consultancies and are in love with virtualisation, the placing of abstraction layers between server apps …
Email was down last night in the UK
I was getting this at about 11pm.
"Sorry, there seems to be a problem. The service you are looking for is temporarily unavailable. We are working hard to restore your access as soon as possible. Please try again in a few hours. Thanks for your patience."
It was back when I got up this morning, so i don't know how long it lasted.
16 minutes: 22:48 - 23:04
Cloud is good as long as you have non-cloud for backup?
It is the most common DR mistake people make.
There are two distinct cases of DR - hardware clusterf*** and wetware (and software) clusterf***.
You can replace the first with a resilience strategy and Cloud is quite good at that.
The second _REQUIRES_ a different technology in order to be reliable. It may be tape, it may be squirrels stashing data nuts, whatever. In google case it was tape. It could have been a disk library as well. Not that much of a difference provided that it is not using the same software/hardware as the day-to-day data resilience.
So cloud actually _NEEDS_ a non-cloud backup :)
"Tape is the archive backstop for lost or duff data on disk, with a 30-year lifespan"
Ah yes, and in 25 years time will you be able to by a compatible tape drive? What about having drivers for your ancient tapes for whatever new OS/computer you are using?
Not that the arguments for HDD are perfect either, but I can't recall any tape technology being readily available/supported for more than about 5 years. Please correct me if this is wrong?
Ah, now CDs of course...
At work we've got tape drives that are from 1996 (DLT7000), we don't really use them much, if at all, they're just there on the off chance that something turns up and needs to be read. If drivers were a problem we'd be able to use VMs with the original OSes that they used to be installed on. Furthermore, it's much easier to migrate data from old tape to new tape than it is from old disk to new disk - just setup a clone/dupe job on a few pairs of tape drives (one old, one new) and leave it to play out. It can all be done within the same library and is automated. Disk migrations, however, tend to require different interfaces, hardware hosts etc. etc.
At my previous company they had two IBM spool tape drives for recovering very old tapes from their mainframe. In practice they were never used, but just there to give a sporting chance to recover from old tape.
On the specifics you quote, my hunch would be most probably yes. Try buying a new DDS2 now (~20 year old tech) and they're not that hard to get hold of, and you'd be able to get it attached quite happily to a current PCIe SCSI card, and it'll work out of the box using generic drivers in the kernel.
But if you're serious about archiving data for the long term, you'll have to verify and transfer it or else you will be bitten as you describe, just maybe a little further down the line. But you /might/ get away with only having to do this once every couple of decades.
DLT is not just a DJ.
As a physical format DLT tape tech has been around for at least 15 years; and I doubt if it will go away any time soon. I believe the underlying storage format/compression is pretty unchanged too and that tapes from the 90's would be quite recoverable.
Now; whether the data that is recovered has any value or has any remaining applications that can interpret it is a rather more relevant matter these days.
Google's tape environment/config turned out to be totally inadequate, they lost what 0.03% of their email accounts and it took several days to get it back, yet as far as I can tell they got everything back.
This is slightly at odds with their assertion that their disk snapshots were corrupt, because you'd backup offline from your disk snapshots, so they'd be backing up the corruption. So, assuming they haven't told porkies in any way, they may have recovered from tape and had to play back transaction logs. This presumes that they don't clear down transaction logs at the point they backup, but keep them hanging round for just this eventuality. Playback of logs wouldn't have taken that long, akin to restoring from a disk snapshot, so this basically returns us to: Google's tape environment is massively undersized or frighteningly badly configured.
This article repeatedly says tape is cheaper than disk for backup.. but I'm not convinced.
With the right infrastructure I guess so; but we are presumably talking about very specialised tape systems here; datacenter type equipment; about as useful to us mortals as a chocolate mousepad.
On amazon a 40/80 Gb DLT tapes are in the $40 price range, and need a expensive dedicated tape drive to be usable. A terabyte of HDD can be had for twice that money, has 10x the capacity, and your machine already has the required interfaces (usb/sata/firewire).
Frankly I'm not convinced that tape is cheaper in any situation..
Try that again with a sensible modern format, LTO4. £20 for 800Gbytes native. Then fit them into a 48 tape loader. So you can pile >35Tb of tapes into this 4U box in your rack, and quickly swap them out with new tapes. What are you suggesting you do with disks again?
If you really do want to archive a load of this stuff offsite, or even into fire proof safes onsite, and you're doing enough that the cost of the drive isn't a big problem, ~£25/Tb doesn't sound too bad.
Interface can be SAS, which your modern server almost certainly already has.
HDDs x Tape...
I'm suddenly glad we don't need a 48 HDD loader, do we?
How much is this 48-tape loader again? Because the HDD headers and logic board are included in the drive price, not just the magnetic disk. The fair comparison becomes 48 drives adding up to 35Tb versus a fully loaded tape interface. Now tape wins.
Yeah, I'm just trolling, I know tapes are cheaper in the long run, since they don't include all that extra needed for HDDs, like sealed cases and needles calibrated to fly micrometers away of highly sensitive magnetic disks.
And they are bound to be faster, since they can POTENTIALLY be read at 48 x 280Mb/s. Right?
And I agree to the other commenter, unplug the HDDs and they will cost just as much as the tapes in that climate-controlled shelf.
"And I agree to the other commenter, unplug the HDDs and they will cost just as much as the tapes in that climate-controlled shelf."
Exactly. The WHOLE ARTICLE assumes your disk-based archive is live. As a matter of fact, from the article:
"trashed data was "snapshotted", replicated etc, propagating the original fault. Fortunately for users: "To protect your information from these unusual bugs, we also back it up to tape. Since the tapes are offline, they’re protected from such software bugs."
I would argue that hard disks can be taken offline, and thus be protected as well. The difficulty here is Google does redundancy to protect from failure (think RAID). This is NOT a "backup" or "archive." Google seems to carry a few days worth of snapshots on live disks, as is alluded by them saying the corruption was spread over a few days of snapshots.... But they had to fall back to their offline storage to find the good data. Since most need for restores happen within the first 24hrs of the data loss (some can stretch that out to 3 days), Google is actually doing a good job here. They protect the most common window with live snapshots: easy to restore. The less-likely-to-be-needed old data is stuffed onto tapes.
As for more in the HDD vs Tape comments:
"How much is this 48-tape loader again? Because the HDD headers and logic board are included in the drive price, not just the magnetic disk. The fair comparison becomes 48 drives adding up to 35Tb versus a fully loaded tape interface. Now tape wins."
1TB HDDs run in the range of $55USD each. Bump that to 2TB and you're looking about $70USD each. With an 800GB LTO4 tape running $35USD ($70 for 2 totaling 1.6TB), it becomes a bit clearer. Now we factor in the drive: a Quantum LTO4 drive for $1,600USD. You could likely pick up a 48-slot NEO400S used for just under $10,000. Of course, you could accomplish the same with 24 2TB HDDs as a JBOD (2x 12 drive racks perhaps?) A HP StorageWorks M6412A with a FiberChannel connection would do. You can get a refurb for $1,121USD (refurb because the 48 tape library was priced as "used"). Now, you can complain that the disks need to be stuffed in the hot swap cases to slot into the array, and yes, you're right. So much more hassle than just stuffing a tape in. But what do you get in return? A FiberChannel connection to your JBOD (two of them actually), which would be faster than SCSI Ultra320 I reckon. HDDs in JBOD would be equivalent to tapes with in an archive (since you'll need to sequentially number both if you use some software to write data across them). Of course, with HDDs, you could do any number of backup options. Tapes, you only have one: use a program to write data to it sequentially. HDDs can just be NTFS with raw files, whole disk encrypted, or a 1:1 partition dupe of your server's disk drive.
What does tape have going for it? The best thing I can come up with is "industry support." Most backup programs are designed with tape being the eventual end-point. Only recently have D2D2T or D2D2D been popular options. Oh, and a note about readability and just "finding" an old tape drive to read your data: you still need the disk drive to actually work. You need the connectors to use it (Ultra2 SCSI anyone?), and be able to actually install your backup software. Depending on your solution, many of those things can be eliminated with disk. I have no problem finding an ATA66 (very old hard drive)-capable connector. You can get one that converts an ATA disk into a USB drive for <$20USD. I doubt even a dusty used tape drive can be had for that cheap. I'm sure SATA will be in the same boat come another 10-20yrs.
Ever tried backing up 3Tb a night to disk? Watch how quickly those disks fail if you repeatedly muller them with that amount of data. Also, kinda more involved to pop disks in and out of machines to take them offsite (not much point backing data up if you don't take it offsite) than it is to eject tapes from an autoloader. I think you're confusing 'mere mortal' with 'bloke in his basement'
"1TB HDDs run in the range of $55USD each."..."A FiberChannel connection to your JBOD (two of them actually),"
Wow. please share the link to those $55USD 1TB FC disks. All the FC disks I have seen run $400US+ for only a 600GB drive. They also suck power as they are 10k/15k RPM drives.
I think the previous poster means that you can get low cost SATA drives and stick them into an array which has FC connections at the front end. There are various arrays that take SATA, from weedy little desktop devices to very serious enterprise arrays, such as the EMC Symmetrix VMAX.
The thing is, and most people who don't deal with the larger stuff don't realise, that the SATA drives that you get for professional arrays aren't the same SATA drives that you would use in your desktop. They're top end SATA drives, which cost rather more than $55 per TB, still a lot less than FC drvies though.
Think about it the other way
Why should I buy a 15k RPM disco for backup? I know they are faster than a 10k - and WAY faster than a 7200 RPM. But... a desktop (I use it as a demonstration of the speed of a "low grade" disc) 7200 1TB hard disk can do about 100MB/s sustained. A 5900 RPM goes about 80MB/s.
Yes, yes. Linear read/write. But... this is backup! By definition it IS a linear read/write! At least ALMOST linear.
No, we don't get up to 280 MB/s in a single disk. Who cares? Build a RAID5, use 5 discs for data and one for redundancy, and it's done! You could call this 6 disks entity a "HardDiskTape", and treat it like a single tape.
No, it is not the ideal (nor the right) solution for everything - but it IS a solution that could be not only good enough, but cost effective too, to a lot of business.
6 cents per Gbyte?
So that's $120 for 2TB, I can buy a 2TB SATA drive for less than that. Sure, the power and cooling requirements are higher... until you unplug the drive, when it also uses precisely 0W. OK, that's a desktop drive, but do you need the high spin speed and high MTBF when it is seldom used?
Tape still wins on robustness, but the costs seem very similar. Tape drives aren't cheap.
Your surely not going to run a massive cloud environment on consumer sata drives?
Why not? Google does....
Use disks as tapes
Rather than having constantly spinning disk as a backup medium, simply have a series of hotswap disks you can write to and then take off and archive.
They use no power when not plugged in, and are often smaller than equivalent capacity tape cartridges. Although streaming rates might not be quite as good, the random access capability would make most data recovery far faster. On top of that, you don't have to buy the tape drive.
I suspect I have had more tape failures over the years than catastrophic disk failures, so it might even be more reliable.
All you really need to do now is to adapt tape library technology to be able to rotate HDDs instead of tapes.
You can't move disk drives at the kind of speeds that the larger IBM TSxxxx or Storagetek Lxxxx libraries move tapes at, they'll just break. The whole library will need to be slowed down, which may not be a simple an engineering task as you'd imagine.
Personally I can only remember having a five or six tapes snap on me and a couple just corrupt, in fourteen years as an enterprise storage guy, whereas I don't even bother to count failed disks.
"tape archives got cloud evangelist Google out of a hole"
That is very disingenuous - what got Google out of trouble was the presence of an old backup from before the corruption occurred, which has nothing whatsoever to do with the format of the backup media.
You're right - the media the backup was on doesn't matter, however: You don't keep the amounts of data that google host on more disk snapshots than you absolutely have to. Tape is still the only realistic medium for long retention, high capacity backups.
Tape isn't for backup, it's for archiving...
..you pay for the drives once and buy lots and lots of tapes (and maybe don't even cycle them). Then, if data corruption goes unnoticed for a while, you've something to go back to - cos your backup disks are probably hosed just like the source.
Mine's the one with the grandfather, father, son QIC 250s in the (fireproof) pocket.
Bad experience with tape
At my previous startup, circa 2000, we had purchased an expensive StorageTek 9730 Timberwolf automated tape library with 4 DLT4 drives. The drives had a MTBF of roughly one month, and we chewed through them at an alarming rate. When the first year's support contract ended, we wrote it off and bought a pair of Apple XServe RAIDs for the same price as the support contract renewal. Those units lasted us for 10 years with no troubles whatsoever, and no wonky tape software to deal with. I don't know if this was just bad luck, but I have been leery of tape ever since.
Wasn't the 97xx series rather long in the tooth by 2000? I'm pretty sure that at my work we were moving on to L series at the time (although it is a long time ago, so I may be a bit hazy.) So, was the library new or 2nd user?
By DLT4, I assume you mean DLT7000 or DLT8000 drives which take DLT4 media? If so, these weren't the most reliable of drives, but you should have been seeing way higher MTBF. I would say you were very unlucky, or there was some sort of problem with your library's robotics that was causing your problems.
A dodgy tape or three
Wasn't the 97xx series rather long in the tooth by 2000?
It definitely was but more to the point the MTBF (Mean Time between failures) on tape drives is directly proportional to the tape drives duty cycles. DLT, SDTL, DAT and LTO GEN 1, 2, 3, 4 and even LTO 5s* had a duty cycle of 8 to 12 hrs in every 24 hrs. What that means is that if a tape drive is used for more than their duty cycle they will fail more often. What people forget when they size a tape solution is that there is a lot more than just the backup window and MB/sec to worry about, most companies do a backup in lets say a 8hr window (usually after hours) and - that's what they size the solution for- but then they may clone that backup once or event twice (clones are usually send off-site and/or kept outside the library) so all the sudden the tape drives are being used for a lot more than 8 hours making them more prone to failures and considerably reducing their MTBF. That's were either more tape drives are needed or alternative solutions are required such as enterprise class drives with a much higher duty cycle - Both IBM (TS1130 - 1TB native) and StorageTek (T10000C - 5TB Native) produce these drives...
And it is fair to say that short term backups to disk are a great idea but for long term data retention and as a last line of defence Tape is still king.
LTO5s have a much better duty cycle than their predecessors but is still below what you will get with enterprise drives but they do serve a purpose and they are a very successful and viable technology.
just my two cents.
Has saved my sorry ass more than once.
Tape is not just for big outfits like Google. If you have not done so already, back your system onto tape and store it off-premises in a fireproof safe.
The thing about last-ditch backups is that you don't need them until you really, really need them. Back up, folks. Do it now.
Make sure that fire safes are rated for tape, not paper, if you want your data back that is.
Disk storage vs disk backup
Disk storage typically is high performance and random access. This means that disks need to spin faster and therefore consume more power. If you only do backup, you only have sequential accesses. You can use cheap desktop SATA drives connected to Atom or even Via C7 boards for that.
And why is that unique to tape
So what got them out of a hole was that the tape backup was from a few days/.weeks/months earlier before the fault.
Why not have a cloud backup combined with a very inefficient internal bureaucracy so that snapshots of the live data to another cloud service take a month - then you always have a online instantly accessible copy of last month's data.
Another tape fan here
I was using DEC TK50 and TK70 tapes in the late 1980s, and although these were superseded by larger capacity DLTs, many of the drives could still read the earlier formats. Your old archives didn't die through lack of backwards compatibility.
Very reliable, easily transportable offsite and would survive a short drop, you had no excuse for not keeping backups off site. The number of unreadable DLT III or DLT IV tapes I encountered over a span of more than a decade could probably be counted on the fingers of one hand, and I speak as someone who managed the backups for a lot of systems, often with three or more tape drives per server.
On the grandfather, father, son comment: No, because silent data corruptions can easily overwrite all generations. I prefer daily, weekly, monthly and yearly cycles. This is where tape really does win on the cost side.
It is slightly untrue that DLT tapes cost nothing to store, as all tapes will deteriorate faster unless stored within certain limits of temperature and humidity, so aircon is required.
DAT tapes on the other hand drives were to be avoided like the plague...
Looks like an apples to oranges comparison
I agree tape has its use cases, but many of the points made in this article are disingenuous to say the least. I had the same preconceptions about tape when I approached a backup project last year, but found it to be much less compelling when faced with the business requirements.
1) If LTO tape costs $0.06 per GB, that equates to a ~$60 per TB - which you can pick up a cheap 'green' 1TB disk for.
2) Disks don't need an autoloader mechanism to scale.
3) The 'energy cost' of disk arrays vs tape isn't relevant because it discounts the fact that you can turn them off/spin them down when not in use. If you use disks as you use tape, the energy costs aren't really any different.
4) The number of tapes you can write to at once is limited by the number of tape drives you have switched on. With disks, they're always accessible. If you need to snapshot your data quickly, tape becomes vastly more expensive because of this issue. Tape drives and their autoloader mechanisms are also rather bulky.
5) Tape deteriorates much faster than disk when used more. Disks don't, at least by the same factor. There's also no such thing (that I know of) that gives RAID-10 like protection to to tape libraries, or metrics that show tape deterioration so you can act ahead of an integrity failure.
6) It doesn't matter if tape is 'offline' - if you back up to it just as regularly as you do to disk, it's going to be prone to the same long term corruption. If you write to tape once and then store it in a vault, you can do the same with hot swappable disks.
7) It's much much easier for a user to restore from a backup disk array (read only filesystem, appropriate access controls set from the original filesystem, connectable via network share) than a tape library which usually requires an administrator to intervene. Unless you go incredibly high end, of course...
1) That means that an LTO tape will take up less space and cost less energy to make, for a comparable cost.
2) They do if you need multiple thousands of them, for two reasons - you won't be able to cheaply and efficiently run a bus to that many drives and it's going to be difficult to find any specific drive when it needs to be moved or replaced due to failure.
3) Energy costs are relevant - a spun down disk has much more 'embodied energy' from the production process. Also, tape drives need no specific cooling when running, an array of disks need to be thought about.
4) Snapshot would be handled in the online disk array, not the 'tape replacement' disk array. You are right that you can only write to as many tape drives as you have switched on, but the real limit is bandwidth from your SAN and ports on your SAN. You may be able to write to a lot of disks at a time, but this isn't really any good if you are crippling your SAN to do so. Also, see my comment about running buses to disks.
5) Most major backup packages now offer something along the lines of "Inline Dupe" where two tape drives write the same data at the same time. It's also common for backups to be taken and duped/cloned to another tape for in an offsite array or to be taken offsite. Tape deterioration isn't really an issue, you can write a tape about 20000 times before you hit problems.
6) A disk in an offline vault requires to be spun up and checked far more often that you need to "exercise" a tape (ie run it from end to end). I think a tape is once every 10 years, I can't really remember though - I certainly wouldn't expect a drive that's been stationary for 10 years to spin up.
7) I my opinion, an end user should never be able to see the backup system. If end users are able to directly access the backup systems they will tend to start using it as their own personal data store, to get round quotas. This will then render the backup at least partially useless. The backup software works the same any way though - you'd still be using the same software to backup / recover.
All that said, I'm not against disk as backup - it's use has to be really well thought out. I think TSM and Avamar with tape out have the right idea.