back to article Drilling into Amazon's tape-killing Glacier cloud archive

Amazon Glacier is a series of cloud vaults holding customer archive data that isn't based on tape libraries. Instead it appears to use object storage and is set to be the largest object storage implementation in history in a very short time. Amazon Web Services team member James Hamilton blogged about the new product, writing …

COMMENTS

This topic is closed for new posts.
  1. Phil Endecott

    Takes hours to retreive

    The blurb says it takes 3 to 4 hours to retrieve an object from the "vault". That sounds more like a tape library than a disk system. Or at least, it sounds like it's spec'd so that they could implement it on tape, even if the initial deployment (to test the market) is on top of their existing disk storage system.

  2. Nate Amsden

    devastating - not

    You're forgetting about the cost of bandwidth, and the amount of time it takes to upload data to such a facility. My own facility which is within 17ms of Amazon's east coast facility still gets only a paltry 3-5MB/second of throughput on a gigabit link for a single stream. Tape is frequently measured in dozens or in the high end hundreds of megabytes a second of throughput(my own experience the source media is often the bottleneck rather than the tape). Most users probably will not have either a high speed link nor a low latency connection to the remote facility.

    I wrote a blog post recently "Freakish performance on Site to Site VPN" where I was able to sustain 10MB/sec+ between a pair of Sonic wall VPNs on a 95ms link with a single stream (highly compressed file and encrypted with ssh) - myself I've never come across anywhere remotely that level of throughput on a VPN even with generic WAN optimization - SonicWall must be doing something really nice (regular internet speeds outside VPN were in the 700kB/s range). Now if I could get such performance to a cloud provider that would be nice, but unlike good cloud providers that allow you to have a hybrid of physical and virtual resources, Amazon doesn't play that game.

    add to that tape can't be easily deleted when it is off site. That is, unless this amazon service is significantly different from S3, it is trivially easy to wipe out all of your backups with a couple commands. Storing tapes totally off line adds significantly more security and protection from that.

    There was one facility I almost hosted my gear at a while ago that had a significant Amazon presence, and there was the option to have a direct gigabit link into their network from mine, in that case it would of been sub millisecond access and I can imagine it would make a lot more sense then.

    For small data sets, it can work and there are already tons of providers out there that provide the service, most of them seem to advertise "unlimited" storage for a low yearly rate. These sorts of folks I think don't really care whether their data is stored in multiple data centers, it's a backup after all.

    1. Tank boy
      Happy

      Re: devastating - not

      "You're forgetting about the cost of bandwidth". Exactly.

    2. Skoorb

      Re: devastating - not

      You can quite easily get a direct link to them via VLANs at a few major internet exchanges. It's only $1620 per month for a 10Gbps port (excluding all data transfer).

      http://aws.amazon.com/directconnect/

  3. Anonymous Coward
    Anonymous Coward

    Not as cheap as it sounds

    Translated as $120 per TB per year, it doesn't sound quite so cheap. Need it kept for 10 years for compliance? That will be $1200 thank you. And that's for less storage than a single LTO5 tape.

    1. Paul Crawford Silver badge

      Re: Not as cheap as it sounds

      For a small user at least you save the cost and maintenance of the tape drive, and the off-site storage of tapes in case of major local damage, etc, which makes it attractive.

      But the lack of any obvious way to control the encryption yourself (unless I missed something) is not good.

      1. Paul Crawford Silver badge

        Re: Not as cheap as it sounds

        Just read the blurb:

        "Secure – Amazon Glacier supports secure transfer of your data over Secure Sockets Layer (SSL) and automatically stores data encrypted at rest using Advanced Encryption Standard (AES) 256, a secure symmetric-key encryption standard using 256-bit encryption keys. You can also control access to your data using AWS Identity and Access Management (IAM). IAM enables organizations with multiple employees to create and manage multiple users under a single AWS account and to set resource-based access policies."

        So basically they encrypt the "tapes" (we presume they use tape ultimately) but the still have access to your data, i.e. it is not encrypted at your side, using a key that only your company has.

        Bend over Blackadder, its PATRIOT time!

      2. Frederic Bloggs
        Facepalm

        Re: Not as cheap as it sounds

        You mean you *don't* encrypt the data before you store it in the "cloud"?

        Really???

        1. Anonymous Coward
          Anonymous Coward

          Re: Not as cheap as it sounds

          Trusting other people's encryption. *Noises indicating derision*. Extremely dubious practice.

  4. Jim McDonald
    Thumb Up

    For SME's with "modest" amounts of data that is a great price-point (I'm thinking of those companies with 25-250 users and perhaps 5-10 servers... so on the smaller end of the scale).

    At $0.01/GB/Month then they can use a service like this as a redundant off-site backup to complement what they do in-house. In that scenario any issues about retrieval/backup speed isn't so much of a concern.

  5. MrHorizontal

    Would be interesting to see how other cloud providers like Backblaze respond to this. Hopefully they will introduce a slightly more flexible service than using their proprietary client. Glacier is not quite as good a deal as Backblaze (being $5 or £2.50/mth), but Glacier will store anything.

  6. spencer

    Hmm, where do I put all my sensitive data?

    Of course! The cloud!!

    1. Mr Young

      Re: Hmm, where do I put all my sensitive data?

      Exactly! If the info belonged to a company and it was data mined (even for a really,really small ad) would that be insider trading? I do wonder about the privacy and security as well as the claimed reliability.

    2. Paul Crawford Silver badge

      Re: Hmm, where do I put all my sensitive data?

      The only sensible option is to encrypt the data with *your* key before it gets to them. Of course that usually buggers up de-dupe and always buggers storage-side compression, so they won't like that being the norm.

      Considering the other problem, that of up/down link bandwidth, you would really want to compress/de-dupe your data before considering backing it up, which would help them as well. Not quite so simple to use properly then.

  7. KendoNagasaki

    Clarfication?

    What does "The annual average data item durability is 99.999999999 per cent – eleven nines" actually mean?

    1. BlueGreen

      Re: Clarfication?

      ta, you saved me the effort of asking so I'll just add that

      "...series of cloud vaults holding customer archive data that isn't based on tape libraries. Instead it appears to use object storage"

      is one of the daftest bits of writing I've seen for a while.

    2. Jonski
      Mushroom

      Re: Clarfication?

      99.999999999% annual data durability - in other words, in a year you should expect to lose about 1kilobyte of data for every terabyte stored, if you haven't annoyed the BOFH. (YMMV, E&OE, IANAL)

      Oh, and megathrust earthquake followed by tsunami followed by diesel generator inundation followed by nuclear meltdown at more than two sites simultaneously notwithstanding.

  8. Anonymous Coward
    Anonymous Coward

    It would be interesting to see

    How well this scales up when there is a wide area disaster (like 9/11) and multiple companies are trying to pull tons of data at once.

    1. Arctic fox
      Unhappy

      Re: It would be interesting to see

      "How well this scales up when there is a wide area disaster (like 9/11) and multiple companies are trying to pull tons of data at once."

      I think we are inventing a whole new context and definition for the word "disaster" here. Particularly if the physical catastrophe also leads to one of those "chaotic"* multiple collapse events in the cloud systems being accessed. I just have a gut feeling that we are creating a whole new experience in vulnerable infrastructure with, potentially, global ramifications that are not yet properly appreciated.

      *"Chaotic" in the "butterfly over the rainforest" sense of the word.

  9. Anonymous Coward
    Anonymous Coward

    What arse in their right mind ...

    ... would backup their sensitive data to a remote location where all and sundry (except possibly themselves) could access it?

    That and the hosting company's T&C's would probaby give the right to mine the data and sell it on for profit to their advertisers and the Stasi.

  10. Legs

    Not very green either

    Bear in mind that all this wonderful object oriented disc is continually taking power (= CO2) plus it has a big real estate footprint plus it has to be cooled/heated. Plus all the other stuff like speed of access, security (is it really secure and hey it'll probably be in the US so if you store something that's nasty about them they'll trump up some charge in Sweden and have you extradited :-P), SLAs etc etc.

    No, I don't think it's the way to go, not yet anyway.

    Cheers

  11. isomorphic
    Alert

    Retrieval Fees

    Amazon's pricing is complex, and they no doubt like it that way.

    It's not clear from their site whether the "Retrieval Fees" are in addition to or in lieu of the bandwidth charges.

    If you're using Amazon Glacier for backup purposes, prepare to be smacked when you need to retrieve everything in a hurry. Even for archive purposes there are probably surprises in the pricing.

    1. Martin Saunders
      Boffin

      Re: Retrieval Fees

      http://aws.amazon.com/glacier/faqs/#How_will_I_be_charged_when_retrieving_large_amounts_of_data_from_Amazon_Glacier

      Try and get your head around that statement. What's odd is the retrieval costs appear to get cheaper the more aggressively (in terms of speed) you retrieve. It's far better to hammer the network for 1 hour and stop, wait 24 hours and hammer it again, than spread the retrieval smoothly over the full 24 hours (see their example and the 4% number). That said, the retrieval costs aren't too bad so long as you don't retrieve very often (ie this is very much archiving, not backup).

      Martin Saunders

      Product Director

      Claranet UK

      Claranet UK

      1. Skoorb

        Re: Retrieval Fees

        Well, make sure you read the 'peak' bit of the calculation. That's the rate that is multiplied over the number of hours in the month... I worked it out as getting on $400 to download 1TB if you have 1TB stored, assuming 80Mbps. You also have the fun of data only being available for download for 24 hours (from 3-5 hours after you ask for it), and not getting any filenames back, just random strings.

        Though as another commenter pointed out, you can always pay them to post you some discs ($80 per disc plus $2.50 per hour of data writing time), it actually seems to work out cheaper if they don't charge the 'retrieval fee'. If they do charge the 'retrieval fee' (which isn't made clear), then the charge is dependant on the speed of the network within their datacentre!

        I would love some real world examples of how to actually go about getting large chunks of data out of this thing and how much the various methods will cost. At the moment it's just too ridiculous for words.

  12. Snooze bar

    Playing catch up

    Pretty much like what Nirvanix is doing today, battle tested and proven... Amazon's pie in the sky, skip the hype and go with tangible results...

  13. Molly

    Cheaper to move the compute to the data - than the data to the compute

    I moderated a panel at SC11 (high performance computing conference) last fall. It was research, national lab and engineering development customers involved. The topic was whether Cloud Archive is practical for HPC data (read as 100s of TBs to PB or even EBs of data). The entire room concluded after 2.5 hours of active discussion that Cloud storage for large data sets is not practical (economically or for speed). Cost of bandwidth, availability of sufficient bandwidth, restrictions on data access and vicinity of the compute to the data and metadata were all strong reason supporting keeping archive data local to compute. These are companies that keep and reference the large quantities of data in their archive. Nothing can beat large on premise tape systems for file large file archive. For smaller data sets measured in GB or TB and not being actively referenced a on a regular basis, Cloud archive seems very compelling.

  14. Henry Wertz 1 Gold badge

    Yeah indeed...

    Indeed, the bandwidth issue is major, as Nate Amsden covers so well. The other part of that, STORING the data is $0.01/GB/month. This does not cover transfer fees to get your data in and out of that storage. These fees may or may not be quite high.

    Also as Paul Crawford begins to allude to, I could see significant regulatory issues with this for a lot of people that are using tape.

  15. Jom
    Thumb Down

    Tape Killer?

    You can buy 60TB of tape storage for about $18K (US). That's $.30 per gigabyte one time purchase. If you expect your tape library to last 7 years, then you would pay $.84 per gigabyte for the Amazon solution. And, you have no retrieval costs with tape storage. How is 3X the cost a killer?

This topic is closed for new posts.

Other stories you might like