Amazon Glacier is a series of cloud vaults holding customer archive data that isn't based on tape libraries. Instead it appears to use object storage and is set to be the largest object storage implementation in history in a very short time. Amazon Web Services team member James Hamilton blogged about the new product, writing …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Tuesday 21st August 2012 16:32 GMT Phil Endecott

Takes hours to retreive

The blurb says it takes 3 to 4 hours to retrieve an object from the "vault". That sounds more like a tape library than a disk system. Or at least, it sounds like it's spec'd so that they could implement it on tape, even if the initial deployment (to test the market) is on top of their existing disk storage system.

5 0
Tuesday 21st August 2012 16:32 GMT Nate Amsden

devastating - not

You're forgetting about the cost of bandwidth, and the amount of time it takes to upload data to such a facility. My own facility which is within 17ms of Amazon's east coast facility still gets only a paltry 3-5MB/second of throughput on a gigabit link for a single stream. Tape is frequently measured in dozens or in the high end hundreds of megabytes a second of throughput(my own experience the source media is often the bottleneck rather than the tape). Most users probably will not have either a high speed link nor a low latency connection to the remote facility.

I wrote a blog post recently "Freakish performance on Site to Site VPN" where I was able to sustain 10MB/sec+ between a pair of Sonic wall VPNs on a 95ms link with a single stream (highly compressed file and encrypted with ssh) - myself I've never come across anywhere remotely that level of throughput on a VPN even with generic WAN optimization - SonicWall must be doing something really nice (regular internet speeds outside VPN were in the 700kB/s range). Now if I could get such performance to a cloud provider that would be nice, but unlike good cloud providers that allow you to have a hybrid of physical and virtual resources, Amazon doesn't play that game.

add to that tape can't be easily deleted when it is off site. That is, unless this amazon service is significantly different from S3, it is trivially easy to wipe out all of your backups with a couple commands. Storing tapes totally off line adds significantly more security and protection from that.

There was one facility I almost hosted my gear at a while ago that had a significant Amazon presence, and there was the option to have a direct gigabit link into their network from mine, in that case it would of been sub millisecond access and I can imagine it would make a lot more sense then.

For small data sets, it can work and there are already tons of providers out there that provide the service, most of them seem to advertise "unlimited" storage for a low yearly rate. These sorts of folks I think don't really care whether their data is stored in multiple data centers, it's a backup after all.

7 0
1. Tuesday 21st August 2012 19:36 GMT Tank boy
  
  Re: devastating - not
  
  "You're forgetting about the cost of bandwidth". Exactly.
  
  0 0
2. Tuesday 21st August 2012 20:09 GMT Skoorb
  
  Re: devastating - not
  
  You can quite easily get a direct link to them via VLANs at a few major internet exchanges. It's only $1620 per month for a 10Gbps port (excluding all data transfer).
  
  http://aws.amazon.com/directconnect/
  
  0 0
Tuesday 21st August 2012 17:04 GMT Anonymous Coward

Not as cheap as it sounds

Translated as $120 per TB per year, it doesn't sound quite so cheap. Need it kept for 10 years for compliance? That will be $1200 thank you. And that's for less storage than a single LTO5 tape.

3 0
1. Tuesday 21st August 2012 18:43 GMT Paul Crawford
  
  Re: Not as cheap as it sounds
  
  For a small user at least you save the cost and maintenance of the tape drive, and the off-site storage of tapes in case of major local damage, etc, which makes it attractive.
  
  But the lack of any obvious way to control the encryption yourself (unless I missed something) is not good.
  
  0 0
  1. Tuesday 21st August 2012 18:59 GMT Paul Crawford
    
    Re: Not as cheap as it sounds
    
    Just read the blurb:
    
    "Secure – Amazon Glacier supports secure transfer of your data over Secure Sockets Layer (SSL) and automatically stores data encrypted at rest using Advanced Encryption Standard (AES) 256, a secure symmetric-key encryption standard using 256-bit encryption keys. You can also control access to your data using AWS Identity and Access Management (IAM). IAM enables organizations with multiple employees to create and manage multiple users under a single AWS account and to set resource-based access policies."
    
    So basically they encrypt the "tapes" (we presume they use tape ultimately) but the still have access to your data, i.e. it is not encrypted at your side, using a key that only your company has.
    
    Bend over Blackadder, its PATRIOT time!
    
    0 0
  2. Tuesday 21st August 2012 19:54 GMT Frederic Bloggs
    
    Re: Not as cheap as it sounds
    
    You mean you *don't* encrypt the data before you store it in the "cloud"?
    
    Really???
    
    1 0
    1. Tuesday 21st August 2012 20:06 GMT Anonymous Coward
      
      Re: Not as cheap as it sounds
      
      Trusting other people's encryption. *Noises indicating derision*. Extremely dubious practice.
      
      1 0
Tuesday 21st August 2012 17:10 GMT Jim McDonald

For SME's with "modest" amounts of data that is a great price-point (I'm thinking of those companies with 25-250 users and perhaps 5-10 servers... so on the smaller end of the scale).

At $0.01/GB/Month then they can use a service like this as a redundant off-site backup to complement what they do in-house. In that scenario any issues about retrieval/backup speed isn't so much of a concern.

1 0
Tuesday 21st August 2012 17:13 GMT MrHorizontal

Would be interesting to see how other cloud providers like Backblaze respond to this. Hopefully they will introduce a slightly more flexible service than using their proprietary client. Glacier is not quite as good a deal as Backblaze (being $5 or £2.50/mth), but Glacier will store anything.

0 0
Tuesday 21st August 2012 18:08 GMT spencer

Hmm, where do I put all my sensitive data?

Of course! The cloud!!

1 0
1. Tuesday 21st August 2012 18:17 GMT Mr Young
  
  Re: Hmm, where do I put all my sensitive data?
  
  Exactly! If the info belonged to a company and it was data mined (even for a really,really small ad) would that be insider trading? I do wonder about the privacy and security as well as the claimed reliability.
  
  0 0
2. Tuesday 21st August 2012 18:40 GMT Paul Crawford
  
  Re: Hmm, where do I put all my sensitive data?
  
  The only sensible option is to encrypt the data with *your* key before it gets to them. Of course that usually buggers up de-dupe and always buggers storage-side compression, so they won't like that being the norm.
  
  Considering the other problem, that of up/down link bandwidth, you would really want to compress/de-dupe your data before considering backing it up, which would help them as well. Not quite so simple to use properly then.
  
  0 0
Tuesday 21st August 2012 18:08 GMT KendoNagasaki

Clarfication?

What does "The annual average data item durability is 99.999999999 per cent – eleven nines" actually mean?

2 0
1. Tuesday 21st August 2012 18:15 GMT BlueGreen
  
  Re: Clarfication?
  
  ta, you saved me the effort of asking so I'll just add that
  
  "...series of cloud vaults holding customer archive data that isn't based on tape libraries. Instead it appears to use object storage"
  
  is one of the daftest bits of writing I've seen for a while.
  
  1 0
2. Wednesday 22nd August 2012 01:28 GMT Jonski
  
  Re: Clarfication?
  
  99.999999999% annual data durability - in other words, in a year you should expect to lose about 1kilobyte of data for every terabyte stored, if you haven't annoyed the BOFH. (YMMV, E&OE, IANAL)
  
  Oh, and megathrust earthquake followed by tsunami followed by diesel generator inundation followed by nuclear meltdown at more than two sites simultaneously notwithstanding.
  
  0 0
Tuesday 21st August 2012 21:18 GMT Anonymous Coward

It would be interesting to see

How well this scales up when there is a wide area disaster (like 9/11) and multiple companies are trying to pull tons of data at once.

1 0
1. Wednesday 22nd August 2012 02:06 GMT Arctic fox
  
  Re: It would be interesting to see
  
  "How well this scales up when there is a wide area disaster (like 9/11) and multiple companies are trying to pull tons of data at once."
  
  I think we are inventing a whole new context and definition for the word "disaster" here. Particularly if the physical catastrophe also leads to one of those "chaotic"* multiple collapse events in the cloud systems being accessed. I just have a gut feeling that we are creating a whole new experience in vulnerable infrastructure with, potentially, global ramifications that are not yet properly appreciated.
  
  *"Chaotic" in the "butterfly over the rainforest" sense of the word.
  
  0 0
Wednesday 22nd August 2012 05:23 GMT Anonymous Coward

What arse in their right mind ...

... would backup their sensitive data to a remote location where all and sundry (except possibly themselves) could access it?

That and the hosting company's T&C's would probaby give the right to mine the data and sell it on for profit to their advertisers and the Stasi.

0 0
Wednesday 22nd August 2012 08:17 GMT Legs

Not very green either

Bear in mind that all this wonderful object oriented disc is continually taking power (= CO2) plus it has a big real estate footprint plus it has to be cooled/heated. Plus all the other stuff like speed of access, security (is it really secure and hey it'll probably be in the US so if you store something that's nasty about them they'll trump up some charge in Sweden and have you extradited :-P), SLAs etc etc.

No, I don't think it's the way to go, not yet anyway.

Cheers

0 2
Wednesday 22nd August 2012 08:17 GMT isomorphic

Retrieval Fees

Amazon's pricing is complex, and they no doubt like it that way.

It's not clear from their site whether the "Retrieval Fees" are in addition to or in lieu of the bandwidth charges.

If you're using Amazon Glacier for backup purposes, prepare to be smacked when you need to retrieve everything in a hurry. Even for archive purposes there are probably surprises in the pricing.

0 0
1. Wednesday 22nd August 2012 08:47 GMT Martin Saunders
  
  Re: Retrieval Fees
  
  http://aws.amazon.com/glacier/faqs/#How_will_I_be_charged_when_retrieving_large_amounts_of_data_from_Amazon_Glacier
  
  Try and get your head around that statement. What's odd is the retrieval costs appear to get cheaper the more aggressively (in terms of speed) you retrieve. It's far better to hammer the network for 1 hour and stop, wait 24 hours and hammer it again, than spread the retrieval smoothly over the full 24 hours (see their example and the 4% number). That said, the retrieval costs aren't too bad so long as you don't retrieve very often (ie this is very much archiving, not backup).
  
  Martin Saunders
  
  Product Director
  
  Claranet UK
  
  Claranet UK
  
  0 0
  1. Wednesday 22nd August 2012 13:58 GMT Skoorb
    
    Re: Retrieval Fees
    
    Well, make sure you read the 'peak' bit of the calculation. That's the rate that is multiplied over the number of hours in the month... I worked it out as getting on $400 to download 1TB if you have 1TB stored, assuming 80Mbps. You also have the fun of data only being available for download for 24 hours (from 3-5 hours after you ask for it), and not getting any filenames back, just random strings.
    
    Though as another commenter pointed out, you can always pay them to post you some discs ($80 per disc plus $2.50 per hour of data writing time), it actually seems to work out cheaper if they don't charge the 'retrieval fee'. If they do charge the 'retrieval fee' (which isn't made clear), then the charge is dependant on the speed of the network within their datacentre!
    
    I would love some real world examples of how to actually go about getting large chunks of data out of this thing and how much the various methods will cost. At the moment it's just too ridiculous for words.
    
    0 0
Wednesday 22nd August 2012 08:18 GMT Snooze bar

Playing catch up

Pretty much like what Nirvanix is doing today, battle tested and proven... Amazon's pie in the sky, skip the hype and go with tangible results...

0 0
Wednesday 22nd August 2012 19:39 GMT Molly

Cheaper to move the compute to the data - than the data to the compute

I moderated a panel at SC11 (high performance computing conference) last fall. It was research, national lab and engineering development customers involved. The topic was whether Cloud Archive is practical for HPC data (read as 100s of TBs to PB or even EBs of data). The entire room concluded after 2.5 hours of active discussion that Cloud storage for large data sets is not practical (economically or for speed). Cost of bandwidth, availability of sufficient bandwidth, restrictions on data access and vicinity of the compute to the data and metadata were all strong reason supporting keeping archive data local to compute. These are companies that keep and reference the large quantities of data in their archive. Nothing can beat large on premise tape systems for file large file archive. For smaller data sets measured in GB or TB and not being actively referenced a on a regular basis, Cloud archive seems very compelling.

1 0
Wednesday 22nd August 2012 22:27 GMT Henry Wertz 1

Yeah indeed...

Indeed, the bandwidth issue is major, as Nate Amsden covers so well. The other part of that, STORING the data is $0.01/GB/month. This does not cover transfer fees to get your data in and out of that storage. These fees may or may not be quite high.

Also as Paul Crawford begins to allude to, I could see significant regulatory issues with this for a lot of people that are using tape.

0 0
Saturday 25th August 2012 17:53 GMT Jom

Tape Killer?

You can buy 60TB of tape storage for about $18K (US). That's $.30 per gigabyte one time purchase. If you expect your tape library to last 7 years, then you would pay $.84 per gigabyte for the Amazon solution. And, you have no retrieval costs with tape storage. How is 3X the cost a killer?

3 0