back to article Drilling into Amazon's tape-killing Glacier cloud archive

Amazon Glacier is a series of cloud vaults holding customer archive data that isn't based on tape libraries. Instead it appears to use object storage and is set to be the largest object storage implementation in history in a very short time. Amazon Web Services team member James Hamilton blogged about the new product, writing: " …

COMMENTS

This topic is closed for new posts.

Takes hours to retreive

The blurb says it takes 3 to 4 hours to retrieve an object from the "vault". That sounds more like a tape library than a disk system. Or at least, it sounds like it's spec'd so that they could implement it on tape, even if the initial deployment (to test the market) is on top of their existing disk storage system.

5
0
Bronze badge

devastating - not

You're forgetting about the cost of bandwidth, and the amount of time it takes to upload data to such a facility. My own facility which is within 17ms of Amazon's east coast facility still gets only a paltry 3-5MB/second of throughput on a gigabit link for a single stream. Tape is frequently measured in dozens or in the high end hundreds of megabytes a second of throughput(my own experience the source media is often the bottleneck rather than the tape). Most users probably will not have either a high speed link nor a low latency connection to the remote facility.

I wrote a blog post recently "Freakish performance on Site to Site VPN" where I was able to sustain 10MB/sec+ between a pair of Sonic wall VPNs on a 95ms link with a single stream (highly compressed file and encrypted with ssh) - myself I've never come across anywhere remotely that level of throughput on a VPN even with generic WAN optimization - SonicWall must be doing something really nice (regular internet speeds outside VPN were in the 700kB/s range). Now if I could get such performance to a cloud provider that would be nice, but unlike good cloud providers that allow you to have a hybrid of physical and virtual resources, Amazon doesn't play that game.

add to that tape can't be easily deleted when it is off site. That is, unless this amazon service is significantly different from S3, it is trivially easy to wipe out all of your backups with a couple commands. Storing tapes totally off line adds significantly more security and protection from that.

There was one facility I almost hosted my gear at a while ago that had a significant Amazon presence, and there was the option to have a direct gigabit link into their network from mine, in that case it would of been sub millisecond access and I can imagine it would make a lot more sense then.

For small data sets, it can work and there are already tons of providers out there that provide the service, most of them seem to advertise "unlimited" storage for a low yearly rate. These sorts of folks I think don't really care whether their data is stored in multiple data centers, it's a backup after all.

7
0
Happy

Re: devastating - not

"You're forgetting about the cost of bandwidth". Exactly.

0
0

Re: devastating - not

You can quite easily get a direct link to them via VLANs at a few major internet exchanges. It's only $1620 per month for a 10Gbps port (excluding all data transfer).

http://aws.amazon.com/directconnect/

0
0
Anonymous Coward

Not as cheap as it sounds

Translated as $120 per TB per year, it doesn't sound quite so cheap. Need it kept for 10 years for compliance? That will be $1200 thank you. And that's for less storage than a single LTO5 tape.

3
0
Silver badge

Re: Not as cheap as it sounds

For a small user at least you save the cost and maintenance of the tape drive, and the off-site storage of tapes in case of major local damage, etc, which makes it attractive.

But the lack of any obvious way to control the encryption yourself (unless I missed something) is not good.

0
0
Silver badge

Re: Not as cheap as it sounds

Just read the blurb:

"Secure – Amazon Glacier supports secure transfer of your data over Secure Sockets Layer (SSL) and automatically stores data encrypted at rest using Advanced Encryption Standard (AES) 256, a secure symmetric-key encryption standard using 256-bit encryption keys. You can also control access to your data using AWS Identity and Access Management (IAM). IAM enables organizations with multiple employees to create and manage multiple users under a single AWS account and to set resource-based access policies."

So basically they encrypt the "tapes" (we presume they use tape ultimately) but the still have access to your data, i.e. it is not encrypted at your side, using a key that only your company has.

Bend over Blackadder, its PATRIOT time!

0
0
Facepalm

Re: Not as cheap as it sounds

You mean you *don't* encrypt the data before you store it in the "cloud"?

Really???

1
0
Silver badge

Re: Not as cheap as it sounds

Trusting other people's encryption. *Noises indicating derision*. Extremely dubious practice.

1
0
Thumb Up

For SME's with "modest" amounts of data that is a great price-point (I'm thinking of those companies with 25-250 users and perhaps 5-10 servers... so on the smaller end of the scale).

At $0.01/GB/Month then they can use a service like this as a redundant off-site backup to complement what they do in-house. In that scenario any issues about retrieval/backup speed isn't so much of a concern.

1
0

Would be interesting to see how other cloud providers like Backblaze respond to this. Hopefully they will introduce a slightly more flexible service than using their proprietary client. Glacier is not quite as good a deal as Backblaze (being $5 or £2.50/mth), but Glacier will store anything.

0
0

Clarfication?

What does "The annual average data item durability is 99.999999999 per cent – eleven nines" actually mean?

2
0
Silver badge

Re: Clarfication?

ta, you saved me the effort of asking so I'll just add that

"...series of cloud vaults holding customer archive data that isn't based on tape libraries. Instead it appears to use object storage"

is one of the daftest bits of writing I've seen for a while.

1
0
Mushroom

Re: Clarfication?

99.999999999% annual data durability - in other words, in a year you should expect to lose about 1kilobyte of data for every terabyte stored, if you haven't annoyed the BOFH. (YMMV, E&OE, IANAL)

Oh, and megathrust earthquake followed by tsunami followed by diesel generator inundation followed by nuclear meltdown at more than two sites simultaneously notwithstanding.

0
0

Hmm, where do I put all my sensitive data?

Of course! The cloud!!

1
0

Re: Hmm, where do I put all my sensitive data?

Exactly! If the info belonged to a company and it was data mined (even for a really,really small ad) would that be insider trading? I do wonder about the privacy and security as well as the claimed reliability.

0
0
Silver badge

Re: Hmm, where do I put all my sensitive data?

The only sensible option is to encrypt the data with *your* key before it gets to them. Of course that usually buggers up de-dupe and always buggers storage-side compression, so they won't like that being the norm.

Considering the other problem, that of up/down link bandwidth, you would really want to compress/de-dupe your data before considering backing it up, which would help them as well. Not quite so simple to use properly then.

0
0
Anonymous Coward

It would be interesting to see

How well this scales up when there is a wide area disaster (like 9/11) and multiple companies are trying to pull tons of data at once.

1
0
Silver badge
Unhappy

Re: It would be interesting to see

"How well this scales up when there is a wide area disaster (like 9/11) and multiple companies are trying to pull tons of data at once."

I think we are inventing a whole new context and definition for the word "disaster" here. Particularly if the physical catastrophe also leads to one of those "chaotic"* multiple collapse events in the cloud systems being accessed. I just have a gut feeling that we are creating a whole new experience in vulnerable infrastructure with, potentially, global ramifications that are not yet properly appreciated.

*"Chaotic" in the "butterfly over the rainforest" sense of the word.

0
0
Anonymous Coward

What arse in their right mind ...

... would backup their sensitive data to a remote location where all and sundry (except possibly themselves) could access it?

That and the hosting company's T&C's would probaby give the right to mine the data and sell it on for profit to their advertisers and the Stasi.

0
0

Not very green either

Bear in mind that all this wonderful object oriented disc is continually taking power (= CO2) plus it has a big real estate footprint plus it has to be cooled/heated. Plus all the other stuff like speed of access, security (is it really secure and hey it'll probably be in the US so if you store something that's nasty about them they'll trump up some charge in Sweden and have you extradited :-P), SLAs etc etc.

No, I don't think it's the way to go, not yet anyway.

Cheers

0
2
Alert

Retrieval Fees

Amazon's pricing is complex, and they no doubt like it that way.

It's not clear from their site whether the "Retrieval Fees" are in addition to or in lieu of the bandwidth charges.

If you're using Amazon Glacier for backup purposes, prepare to be smacked when you need to retrieve everything in a hurry. Even for archive purposes there are probably surprises in the pricing.

0
0
Boffin

Re: Retrieval Fees

http://aws.amazon.com/glacier/faqs/#How_will_I_be_charged_when_retrieving_large_amounts_of_data_from_Amazon_Glacier

Try and get your head around that statement. What's odd is the retrieval costs appear to get cheaper the more aggressively (in terms of speed) you retrieve. It's far better to hammer the network for 1 hour and stop, wait 24 hours and hammer it again, than spread the retrieval smoothly over the full 24 hours (see their example and the 4% number). That said, the retrieval costs aren't too bad so long as you don't retrieve very often (ie this is very much archiving, not backup).

Martin Saunders

Product Director

Claranet UK

Claranet UK

0
0

Re: Retrieval Fees

Well, make sure you read the 'peak' bit of the calculation. That's the rate that is multiplied over the number of hours in the month... I worked it out as getting on $400 to download 1TB if you have 1TB stored, assuming 80Mbps. You also have the fun of data only being available for download for 24 hours (from 3-5 hours after you ask for it), and not getting any filenames back, just random strings.

Though as another commenter pointed out, you can always pay them to post you some discs ($80 per disc plus $2.50 per hour of data writing time), it actually seems to work out cheaper if they don't charge the 'retrieval fee'. If they do charge the 'retrieval fee' (which isn't made clear), then the charge is dependant on the speed of the network within their datacentre!

I would love some real world examples of how to actually go about getting large chunks of data out of this thing and how much the various methods will cost. At the moment it's just too ridiculous for words.

0
0

Playing catch up

Pretty much like what Nirvanix is doing today, battle tested and proven... Amazon's pie in the sky, skip the hype and go with tangible results...

0
0

Cheaper to move the compute to the data - than the data to the compute

I moderated a panel at SC11 (high performance computing conference) last fall. It was research, national lab and engineering development customers involved. The topic was whether Cloud Archive is practical for HPC data (read as 100s of TBs to PB or even EBs of data). The entire room concluded after 2.5 hours of active discussion that Cloud storage for large data sets is not practical (economically or for speed). Cost of bandwidth, availability of sufficient bandwidth, restrictions on data access and vicinity of the compute to the data and metadata were all strong reason supporting keeping archive data local to compute. These are companies that keep and reference the large quantities of data in their archive. Nothing can beat large on premise tape systems for file large file archive. For smaller data sets measured in GB or TB and not being actively referenced a on a regular basis, Cloud archive seems very compelling.

1
0
Gold badge

Yeah indeed...

Indeed, the bandwidth issue is major, as Nate Amsden covers so well. The other part of that, STORING the data is $0.01/GB/month. This does not cover transfer fees to get your data in and out of that storage. These fees may or may not be quite high.

Also as Paul Crawford begins to allude to, I could see significant regulatory issues with this for a lot of people that are using tape.

0
0
Jom
Thumb Down

Tape Killer?

You can buy 60TB of tape storage for about $18K (US). That's $.30 per gigabyte one time purchase. If you expect your tape library to last 7 years, then you would pay $.84 per gigabyte for the Amazon solution. And, you have no retrieval costs with tape storage. How is 3X the cost a killer?

3
0
This topic is closed for new posts.

Forums