FCC. That’s the way storage could be going in data centres in the next few years. FCC; flash, cache and the cloud. Amazon’s physical cloud storage gateway development – well, the supposed development – caused little ripples, echoes in my mind, and the many confusing pieces in the constantly developing storage technology jigsaw …
Very much horses for courses, if you are only likely to want resilience then it's fine. If you want to be sure that you can go back a couple of days to back out a modification(deliberate or otherwise) then it's fine. If the Regulatory body which oversees your business sector says they want you to provide an accurate audit trail of selected data items from cradle to grave (or up to 7 years anyway) then no, just no.
People are forever predicting the death of tape but they always neglect to explain what will replace it for very long-term storage.
What a nonsensical title. Just what do you think "the cloud" sits on?
Is it some mystical "other stuff" which we mere mortals know not of?
Or perhaps it's ALL flash based, Hmm?
NO, It's spinning rust, and probably Tape for backup as well.
Out of sight, out of mind.
The name "cloud" refers to the huge mass of exhaust gases pumped out by all the vast remote data centres needed to keep it going. Fortunately they are all safely out of sight, so we can forget they exist. And their vaporous residues, and the rivers they warm to keep from melting.
...just rename your SAN(s) to "Private Cloud" and the bean counters and CIO's are happy again.
Re: Easy fix....
They already do.
In the consumer market, I've already seen external HDDs, NAS boxes etc advertised as "personal Cloud storage". :/
Re: Out of sight, out of mind.
Out of sight, out of mind ... and very likely out of your control.
Too bad if the cloud supplier goes belly-up. Too bad if a serious problem means the data is lost and there's no backup. But hey, it's probably cheap!
"Cloud"="Someone else's Computer"...
Graham Cluley was interviewed recently for Computing about file security and said this...
"Replacing all instances of the word “cloud” with “somebody else’s computer” might make organisations stop and think about the security implications of cloud computing."
Try that in every business proposal ever floated across the C-tier boardroom and see if the response would be different...
I agree with the often overlooked point in the AWS messaging, and in the Cloud story in general. Clouds still need physical storage and that won't be SSD. It will be spinning rust and tape. The majority of data in 10 years will still be on spinning rust and tape - regardless of the marketing name we have for the solution.
I think you'd be lucky if many of them are doing actual backups at all. Replication is more prevalent and easier to manage but is not a substitute for a good off-line DR copy in case of emergency.
For any large amount of data doing it in house is hugely and I mean hugely cheaper. Over a five year period 500TB in the cloud will cost over two million USD. For that sort of money you can buy the storage build a data centre and pay the staff and still have a significant amount of money left over. Then for the next five years the data centre comes free.
Then there is the simple fact that there is insufficient manufacturing capacity for flash to change the equation radically in the near future. It takes at least a couple of years to bring a fab online so capacity is known for some time ahead. Flash manufacturing capacity is only a couple of percent for hard disk capacity let alone tape.
Also where are the cloud providers going to store this data? Magic pixie dust or something?
In short neither disk or tape is going anywhere fast.
Whilst I agree with you that doing it in house is always going to be cheaper if you have scale, the numbers you've miss something very important.
If you were going to store 500TB in this kind of system, chances are some of it would have made it to the archival stage by now. Archiving costs 1/9 of what regular S3 storage, so 500TB of archival space over 5 years is less than $340,000 (about $68,000 per year). Obviously different datasets will require different mixes of hot and cold storage, but that can bring the $2m down quite quickly
And what about for 20TB?
Is it worth employing a dedicated support team for that amount of data, or should you pool support resources with other people, ie put it into a cloud?
"Spinning rust and tape are [not] DEAD"
They're just [not] in _your_ server.
Re: "Spinning rust and tape are [not] DEAD"
Nah, they just send the data scurrying along very long fibre links constantly on the move…
Re: "They're just [not] in _your_ server."
They bloody are, mate! No way am I swapping a SATA bus for BT's wet string.
And in the cloud, the storage is?
Well, I guess spinning rust and tape.
So not so clear cut at all.
I'm still dubious of the longevity of data stored on flash RAM, especially if the flash is stored 'cold', i.e. without power. Until this is proved, I would be dubious about using it for information that legally has to be kept for years, which is the traditional domain of archive and long-term backup.
And that's not to mention the security implications of having the data ultimately stored out of your control ('binding' contracts are only as good as the people who wrote them, and nothing like having physical security surrounding your data). If a cloud provider goes bust, or is taken over by another company whose modus operandi is not acceptable to you, how do you extract and export the terabytes of information they've been holding for you to move somewhere else, and ensure that they've destroyed all copies of the data.
Re: And in the cloud, the storage is?
/dev/null in the worst case.
How are those files stored on Megaupload doing these days?
The gateway dedupes and compresses all traffic going to the cloud, and strongly encrypts it as well, keeping the NSA busybodies at bay.
I think whoever has been showing you this 'future' really hasn't understood much about the revelations Mr Snowden has made.
In fact if you look at the revelations he has made, and apply that to the storage architecture you've written up here, it would seem the NSA are going to be needing another few acres... in fact make that a few hundred acres... of compute power to scan through all the extra data your proposed future architecture is going to make available to them.
This idea of a 3-layer storage cake seems quite a compelling outline picture. It’s tidy, at any rate. Is it sensible?
I have to admit, it doesn't 'compel' me towards a future which looks like that.
It's amusing to remember that the 'Cloud' as a quasi-technology was the word thought up by academics in the late 90s and early 2000s to mean international research groups using cooperative computing via networked resource pools.
Then the marketing boys noticed the word, stepped in and a few players realised they could make a few bob by appealing to corporate boards. So much easier and cheaper, just like outsourcing was for a bit...
Has ben doing this for years, but it's Proprietary and Oracle want a fortune for it.
I've just had 3 different vendors pitch more-or-less the same ideas at me (I've been looking for a decent hierarchical storage system for years)
All 3 of them have gaping implementation flaws and are almost as pricey as SAM-QFS
First rule for data integrity: Everything goes to tape at least twice - the higher levels are simply there for caching (read - or write in the case of delayed commits to tape) . If you "move" or "migrate" data between levels you're putting it at risk.
Second rule: proprietary filesystem formats need not apply. BTDT, cleaned up the mess.
Alan, what flaws have you found in DMF?
Certainly satisfies the requirement for everythign going to tape at least twice - you can specify that easily, and also look for any files which somehow have ended up on only one set of tapes. You can easily use disk as a cache layer (which I do).
Indeed when the AWS annoucement of virtual tape libraries on their storage gateway came out it set me thinking on a configuration where you have a local tape library, with the primary copy, and use Amazon Glacier for the second copy. Cost aside, you have disaster recovery.
The thing is, what's the SLA you have to meet in the event of a DR? Glacier is, somewhat er, glacial. Folks used to complain about access speeds, now LTO et al, are PDQ, so having made that gain, some folks may buy into the hype and shoot their DR in the foot while they're at it. While the cloud (and low-bandwidth deduped replication for that matter) may be OK for trickling backups to over time, it utterly sucks in the event of a DR when you need all that data back faster than the boss can dial your phone.
Cloud DR is a recovery process with a prostate problem.
I think the cost of the bandwidth needed to punt your 100 GB per day (or whatever) to the cloud isn't insignificant, and when you need to retrieve a couple of terabytes to search for that email your salesdroid sent confirming some contract clause was ok, then the speed could be an issue. Ok so tape's not super fast but read speed isn't subject to external network contention caused by person+dog downloading the latest cute kitten video.
Comepletely agree. Then when the backups fail or you can't retrieve ImportantFile.omg it's the network bod that gets the blame. You cannot have data centre speeds across a WAN. This means you can't have Data Centre level services across a WAN.
While I agree with your general point, I think you missed a trick:
...when you need to retrieve a couple of terabytes to search for that email your salesdroid sent confirming some contract clause was ok...
implies a very poor architecture. A sensible archive format with an inexpensive compute node in the cloud would allow you to find that email without retrieving any other data -- run the search in the cloud, filter in the cloud; download final result.
Tiered storage is nothing new
High-end storage systems have been employing it for years, what is new is the RAM or SSD tier 0 or 1, and the fact tapes or WORM discs are not so mandatory today. A few years ago you had fast spinning SCSI disks on tier 0, slower ATA disk on tier 1, and tapes or WORM discs on tier 2. Now you may have RAM at tier 0, SSD at tier 1, fast SAS/FC at tier 2, slower but larger SATA/FATA at tier 3, and maybe tape for backups. If you have links capable enough, you can consolidate some archival storage to a couple (for redundancy) remote sites - yours own, or rented (cloud) ones.
Actually, beside data security, the issue is the link. With LANs going toward 10Gb and 40Gb ethernet, and having larger and larger files to move, the link between you and the cloud could be too slow.
Next week you'll no doubt do a contradictory article about how tape isn't dead. Because it isn't, and that's what always happens when the Reg posts an article about tape.
Jurisdiction and security
Cloud and such probably rocks if your business is cat videos. However for mission critical business purposes, especially in a competitive market place, you would be crazy to stash said data overseas, especially in the US; and where is your clear audit trail? Relying upon people you have not met, in a data centre you have not seen, who don't have to follow your laws at all (in fact, your data falls under their laws).
Article appears to be a troll for comments. Must resist the commentard urge...
When I started my higher education some 20 years ago they were predicting the death of tape.
Tape is cheap, plentiful and for offline backups there's nothing that can really beat it.
I fully expect they'll be predicting the death of tape on the day I pick up my gold watch.
The NSA thanks you for this article.
You are essentially correct
The architecture would be more complex, but you are correct in the notion that as a practical matter storage will soon be lifted entirely off of specific media. People saying otherwise are confusing architecture with implementation.
When, at a high level, you make a call to retrieve something in software, it is only the API that matters. How it is implemented under the covers is irrelevant. A select statement in SQL, for instance, neither knows nor cares if the data is physically in RAM, on disk, tape, CD, floppy disk, coming from a middle tier, distributed and assembled by a middle tier or whatever. We are moving to similar abstractions for all data. It is high time, too.
Re: You are essentially correct
When, at a high level, you make a call to retrieve something in software, it is only the API that matters. How it is implemented under the covers is irrelevant.
After all who cares if that time critical transaction takes 10 minutes whilst the data is retrieved from an inappropriate data store.
The person running the SQL server cares... he cares a great deal. If his server fails to deliver appropriately (within spec for the application) it could cost him his position, especially when it keeps happening because he has no idea what storage medium the data is on.
Sometimes I read the comments and think "meh", other times I just have to point out bleeding obvious.
Re: You are essentially correct
Again, architecture vs implementation. Somebody indeed has to look at particulars of sending signals down fiber, disk caches, etc. However, it is bleed between levels of abstraction that has gotten us in this mess we are in.
When writing or reading a stream I should neither know nor care where the bits reside, if anywhere. Eventually, just as we have already with the world wide web, we will specify the names of abstract things and read from and write to them the same way whether they are a middle tier that throws the bytes away, a local USB drive or an entire distributed network of server farms acting as a single storage entity.
Sometimes we have to make some pretty horrible compromises to overcome limitations. However, we have to recognize them for what they are. They are hacks to work around a deficiency. The course to take is to cure the deficiency, not canonize the hack.
It may take some time to shake out, but references to specific device details in application software is not the future.
Flash is still underused in the data center.
Those Backblaze boxes are looking sexy though.
This is sooooo very annoying
Tape is dead.. arrggg my head is exploding. The mainframe is dead too.
Anyhow, how about a real-world counter-point?
Now maybe some kool-aid drinking tape-is-dead fanboi floated the idea at the Chocolate Factory to
replace tape? And then someone else said "Why? And how are we going to justify that? More
importantly, how are we going to justify the greatly increased cost?"
But back to the puzzled tech author of the link above... tape? Seriously? And he prattles on about
how long it takes to restore, how many tapes (LTO2? ummm... how about at least LTO4 - but I digress).
He misses the main point... how much cheaper it is, he writes as if it is more expensive. How silly...
as if The Chocolate Factory hasn't figured out the cost.
Excuse me if I don't send flowers.
I work for a tape manufacturer and found sitting in the AWS sessions at reInvent a bit bemusing. For such a huge company with so much technical capability, the marketing message was way off. They were targeting their tape obsolescence message toward small backup users with presumably backup applications that don’t know how to write to disk or file systems. A very old, very small, obsoleted market. What they didn’t address was the storage their services run on, nor the competition of other providers or end users building their own at lower cost points than can be achieved with AWS pricing. I think AWS will take over the consumer market. But big data sets, the enterprise, government? No way.
- Review This is why we CAN have nice things: Samsung Galaxy Alpha
- Hey, YouTube lovers! How about you pay us, we start paying for STUFF? - Google
- MEN: For pity's sake SLEEP with LOTS of WOMEN - and avoid Prostate Cancer
- Vid BONFIRE of the MEGA-BUCKS: $200m+ BURNED in SECONDS in Antares launch blast
- Tim Cook: The classic iPod HAD to DIE, and this is WHY