back to article Plane or train? Tape or disk?

Where is the dividing line, onside of which says take the train to Paris from London and the other says fly? Where is the dividing line, one side of which says store your saved data on disk and the other points you towards tape? Plane are faster than trains, like disks are faster than tape, and budget fares are cheap, but that …

COMMENTS

This topic is closed for new posts.
  1. Diskcrash

    Incorrect comparison really

    The comparison really doesn't work since with a train versus a plane the comparison is really about which is faster and or cheaper to effectively do the same thing where as with disk and tape they really are not the same thing.

    I would suggest that people really need both. With disk based backups the primary concern is and should be for quick and easy restoration of data or even individual files. With tape backup the purpose is no longer so much to restore data but to have multiple archival copies of data to choose from and to also comply with various data retention laws around the world. Archival with disks can be done but it really is not the best use for them for numerous reasons including cost and long term reliability.

    A more appropriate comparison would be to say a disk drive is a lorry (truck) and that tape is a warehouse. They both can store things but while a lorry is mobile the cost per square foot of storage is prohibitive when compared with the costs of a fixed structure such as a warehouse. A warehouse is inconvenient in that it can't move and is generally not where you want it to be when you need it but it does store lots of stuff efficiently.

    The real underlying question again is why choose only one?

    1. alwarming
      Thumb Up

      Awesome!

      "A more appropriate comparison would be to say a disk drive is a lorry (truck) and that tape is a warehouse."

    2. Ammaross Danan
      Coat

      Incorrect Comparision

      "The real underlying question again is why choose only one?"

      Exactly. The best bet to be fully protected would be to use a combination of the two. After all, how much is your data really worth?

      However, I do agree the comparison is highly unfit for purpose:

      "You have to get to the airport two hours before the flight, undergo a lengthy security check, face cabin baggage restrictions or wait at the hold baggage carousel the other end, and then get from your arrival airport to your actual destination, adding time and cost."

      Unless the "two hours before the flight" part is disk formatting, not much else applies here. Backup Exec (as much as I dislike parts of it) will do D2D2D without much hassle, do compression on the fly, and encrypt the results as well. Should handle the "baggage carousel" and "security check" parts just dandy. A bit of proper solutions, and you can eliminate the "arrive two hours early" bit by scripting a format on first use.

      Now, why use tape over disk? One can suggest cost. However, I just picked up some 1.5TB HDDs (external, USB) for $70. 1.5TB of tape would be ~$55 as two LTO4 800GB tapes, or ~$74 as a single LTO5 tape. So no, price isn't really a comparison point, especially when you factor in a 24/48 tape rack unit or stand-alone tape drive. Even a JBOD 24 disk rack costs less than a 24 tape rack. The key point is, the disk drives are quite reliable when not in use. Every time I hear discussion of "hard disk backups," everyone automatically assumes "live and spinning" drives. This may be a good idea for your standard backup target. However, for archiving, the disks do not need to be continually plugged in. My "archived" drives have a grand total of 10 "on" hours. Just 10. Granted I'm only in charge of ~500GB of actual data (minus the VM images, ISOs, etc). The pleasant thing about my drives (being of the 1TB or 1.5TB variety), I store more than just a "month end" on my "archived" drives. They actually have a backup from early in the month (perhaps the 8th) as well as mid-prior month (about the 17th). Therefore, if my previous month-end drive is unreadable for ANY reason (including getting lost, damaged, eaten, or otherwise destroyed), I can snag data from the next-best point in time (likely the 8th of the following month) from the next-month's drive. Tape can do with with appending data to the end of the backup set, so it's not a "big" selling point. However, doing this EASILY is the advantage of disks. You can't simply delete a backup set from tape by deleting a folder and writing new data; you'd have to purge the whole tape if your data set started before an appended dataset.

      Lifespan. It's a moot point. Companies should be doing D2D or T2T data refreshes at least every 3-5yrs. Depending on storage conditions, the readability of either disk or tape in this case should be the same: just fine. The hard disk platters aren't going to be geomagnetically zeroed, nor is the tape. Granted, bringing tapes or drives from storage for their "refresh" cycle does incur a risk of accident. Which would fair better sitting on the shelf for a full 10 years? Well, tape has made strides in the resiliency of the magnetic ribbon the data is stored on, but I personally wouldn't trust it past 10 years. I've read articles written about a company's "data viability" checks, which have found that their tapes aren't very readable (20% in one case) after 8 years or so in "ideal" storage. But that was 4 years ago or more, meaning at least 12-yr-old tape tech.

      Honestly, the best reason I can think of for using disks is rapid recovery. You don't have to read in an archive manifest and find your way to the data you need, then spend the tape seek-time recovering it. Depending on your disk-drive backup method, you may even be able to use OS tools (such as full-text search) to help you find the data you need. Very rarely when I'm asked to recover something does the person know the date it was deleted or the name of the file. They tell me "a few months ago" and "it has this in the text" (for Word docs for instance). A very simple "file contains" search in windows will find it on a "raw files" backup (what you'd see with a copy/paste of a folder tree). You simply can't do this with a tape. You'd have to restore your whole dataset and then search through it.

      For those still insistent on tapes, you likely have a tape library and other robotic or automated tools, hence your investment keeps you tied to the technology. D2D2T is always an option, and a rather good one in my opinion. However, depending on your know-how, you can set up a D2D2D system that can be worlds better than simply using that middle "D" as a landing space to be promptly shuffled off to tape oblivion. My middle "D" solution holds a 6-month rotating archive on-site, with about 120 days of rotating daily snapshots, all while still spawning off month-ends to the final "D" for offsite storage. I've only once had to grab our off-site storage backups due to this. It was to fetch a file nearly 2 years old, and the best guess they had was "it was in my files over a year ago." Took us hooking up 3 USB hard drives (we tried at 1.5yrs and second was at 2yrs, third was for the "most recent" copy at about 1.75yrs), and about a 30min full-text search for the first drive. File was recovered after about an hour of returning with the offsite backup disks.

      1. willowtoo

        500 GB ?

        The scale and access demand of the data is going to make all the difference here. We generate 2 TB or more of new video type data a week with very low retrieval rates.

        LTO 4 tape costs a few tens of pounds per TB and next to nothing to store. Even MAID type disc starts to get expensive in that space.

        On the other hand we have a hundred TB of (slow ) spinning disc and a few hundred GB of fast storage for transactional processing. Backup of that is disk snapshot -> disk -> tape.

        At reasonable size there is no single answer.

    3. Guus Leeuw
      FAIL

      Re: Incorrect comparison really

      Sir,

      you state that "With tape backup the purpose is no longer so much to restore data but to have multiple archival copies of data to choose from and to also comply with various data retention laws around the world", and further you add "Archival with disks can be done but it really is not the best use for them for numerous reasons including cost and long term reliability.".

      May I object to your sentiment in the following ways:

      1) Have you ever tried to restore from tape a piece of data that was stored on that same tape in 1985? (1985 being 25 years is not too rediculous: mortgage documents oughta be kept for the timespan of the mortgage)

      2) How do you propose to show that disks are less reliable than tape?

      3) How do you propose to retrieve the documents related to a court investigation that has so far lasted 3 years in a somewhat timely fashion when using tape, and, more importantly, how do you guarantee that the retrieved document is actually the document that they are after (unaltered, the last pulished version)?

      While tape is good for disaster recovery purposes, tape is useless when it comes to regulatory requirements.

      Neither tape nor disk is any good when it comes to the retrieval process. You need software for that, and that software should be agnostic in terms of storage media. Indexing multiple tape catalogues for that purpose is a nightmare.

      Almost last, but not least, objection: Say your data has been taped up well before NDMP came about: How do you propose to retrieve (restore) the data? Do you always make sure that the new backup software you just purchased and started to operate can read all of archival tapes?

      And last: what do you propose as the reason many "tape" libraries are now virtualised in a way that machines pretend to be tape-bound but actually use disks to store data?

      My clear and present favourite for archving is disk, accessed through a structure-agnostic and storage-media-agnostig service such as SOAP or REST. For those access protocols, the data is just a BLOB w/o structure. The BLOB can and will be replicated to another location, with a retention period before the expiration of which the BLOB cannot be deleted and during which a minimum number of copies (replica's) can and will be enforcably maintained by the management software. The tape, OTOH, is a single medium that is most-likely not replicated, and surely not enforced-replicated to a number of tapes located in different geographical areas.

      The IT field of storage is not something just about everyone can fiddle with, much akin that not everybody in IT is a good system administrator, or a good software developer.

      Regards,

      Guus Leeuw

  2. The BigYin

    Umm...

    ...can this entire article be summarised as "There is no one answer, pick what suits your particular needs"?

  3. andy gibson
    Happy

    tapes for me

    I don't think the disks would last long with my clumsy nature.

  4. Yet Another Anonymous coward Silver badge

    Thought this was a government security briefing!

    Now pay attention - you leave USB sticks in cab's, disks on trains and tapes on planes.

  5. John Thorn

    This reminds me

    How should we upate the old adage "the fastest data rate is a magnetic tape on a motor bike" for the modern age?

  6. rch

    Choose both

    First I don't follow this disk is always faster thing. For large sequential files ie. your big database tape gives predictable and high performance for backup and most important, your restore. Disk with deduplication is massively over hyped. What happens with your restore performance when your large database is deduplicated into numerous little pieces scattered around a few spindles of slow running SATA?

    For any large setup use both disk and tape. Tape for copies. Mix of tape and disk for your primary backup.

    And most important: Choose backup software that is flexible and which can leverage the advantages of both disk and tape. Do not buy software that forces you into one storage technology. Do not listen to EMC and Symantec.

This topic is closed for new posts.