Flash is the new disk and disk is the new archive: so said an AOL presenter at Brocade's Techday summit. What was he smoking? Is this just spin? I believe not. I believe we are seeing the dawning of a new storage age, one in which flash is used to store primary, tier one data; capacity disk is used to hold secondary and backup …
Flash-based Array Price/Performance
"The SPC-1 rankings are going to be dominated by flash systems, both in a performance sense and in a price/performance sense with outrageously impressive advances in price/performance by flash systems"
I'm unclear on where these outrageously impressive advances in price/performance will come from with flash systems. Two reasons for this: first, flash as a technology doesn't have too much further to go. There are other technologies coming down the line but don't hold your breath for flash to get significantly bigger or cheaper any time soon, certainly in a way that supercedes that of disk.
Second, many arrays are unable to handle the sheer number of I/Os that SSDs can generate, so there will be diminishing returns with these products. Scale-out products will help the performance aspect but unless there is a move away from the "big controller and lots of disk shelves" model the costs will be prohibitive thanks to the "big controller and a few SSDs" model that will result.
SSDs don't generate I/Os
A common misconception - fill an array with SSDs and you'll overpower it with I/Os.
Fact: the maximum I/Os an array can handle is unrelated to the disk or solid state drive technology. You can simply reach this limit with fewer drives using Solid State. A well-designed array won't "fall over" if the drives support more I/Os than the array does...and indeed, well-designed SSDs will respond to every I/O significantly faster than will disk drives.
Also, I believe the term "flash" is being used as a pseudonym for Solid State drives here...as you note, there are many other technologies coming down the pike...PCM looks to be next, with cost and performance advantages over NAND that will undoubtedly accelerate adoption...
"Tape's cost/GB stored blows disk away". Oh no it doesn't!
Prices just Googled: LTO5 cartridges about £50 (1.6Tb). 1Tb SATA disks about £33. Cost/Tbyte about the same. And then there's the cost of a very expensive tape drive to factor in. Disk just needs a hot-swap enclosure and a few caddies, or a raw-drive docking station.
Of course it's comparing different sorts of fruit. Tape wins for offline off-site storage, because a tape cassette is mechanically rather more robust than a SATA disk. On the other hand, Rsync or similar algorithms across a network to a remote disk mirror is far more accessible for things like recovering accidentally deleted or corrupted files, or doing post-mortems after TS hits TF.
Look, we're talking big systems here, not a couple of disks in an enclosure. Of course a single, or small amount of SATA disks at a small company (or at home, as I do) will be less expensive than tape. When you're talking about managing even medium sized companies tape will win out for many reasons.
The company I work for is rather large, we backup between 800TB and 1PB a night, depending upon the night of the week. We manage tens of thousands of tapes which we don't need to power or cool and are automatically loaded by robotics, we retain data from one month to seven years - this simply cannot be done with disk.
Yes, an online disk will be able to recover an individual file faster, indeed most large systems I backup are backed up from a disk snapshot, to maintain system uptime and give a single fast recovery point before going to tape. However the main amount of time in recovering a deleted file is not the get tape/load tape/find file operation, it's the call helpdesk/get them to recover file, which you have to do before that. I would further add that most NAS or file servers do offer an option to trake snapshots and allow end users to recover files from these snapshots, in my experience you need to prevent your end users doing this and get the helpdesk to do it becuase they just use it as 'above quota' disk space.
Was doing so well
This started out as a great, well-balanced article. Even had comments and clarifications such as: "But disk, constantly online disk, is always at risk....", note the "constantly online disk" clarification.
Then, disregarding all this preamble, Chris writes his conclusion thus:
"Tape's cost/GB stored blows disk away. Tape's reliability, with today's media and pre-emptive media integrity-checking library software is far higher than disk. Tape cartridges don't crash. Tape cartridges aren't spinning all the time, drawing electricity constantly, vibrating themselves slowly to death, generating heat that has to be chilled, and – most importantly – are not always online, always susceptible to lightning-quick data over-writing by dud data or file deletion."
He again compares to tape and states "and – most importantly – are not always online" (which is a good half the paragraph of the effects of disks being always online). He also assumes that the pre-emptive media integrity-checking is a common feature, rather than the new (old?) idea that just got implemented in a single tape library from a specific vendor....
Also, once again, he assumes "Tape's cost/GB stored blows disk away." which, in fact, it doesn't. A "FUJIFILM LTO Ultrium G5 - LTO Ultrium 5 - 1.5 TB / 3 TB" (from Amazon) lands at $67USD, being one of the cheaper options, but you can get a "Seagate Barracuda 7200 1.5 TB 7200RPM SATA 3Gb/s" (from Amazon) for $69.99USD (regardless of how you feel about Seagate. You want WD? It's only $64.99 atm). That's bit for bit the same size (the 3TB is assuming a 2:1 compression, which can be done using HDDs as well. The effective bits are the same, 1.5TB). Sorry Chris, cost evaluations are required before claiming a cost difference that "blows disk away."
As for longevity, I'm willing to bet disk has a higher in-use lifespan too. Have a tape that has as many on-hours as an HDD, and tape will lose. Granted, this has no relevance since most backup storage is used perhaps 30 times before being permanently archived/retired.
When you're looking at solutions such as "Overland Storage REO 9100C VTL" or the equivalent Tape variety, you're definitely going to do better with the traditional tape option, due to the always-on disks and the like, but use a JBOD disk-spanning option, and you could take your backup targets offline (the disks) once your backup job is complete (referring to the last D in a D2D2D option).
Disks have pros and cons compared to tapes, but currently, it's more of a user-preference than any technical or "better than" mythos that delineates the use of either.
Power down those disk archives
Another common misconception is that disk drives *must* always be spinning, and that is used here as an argument against tape.
So...why not just power down those drives that you've copied your archive to? Spin them up from time to time to verify that they still work and the data hasn't corrupted, and viola! Cheaper than tape to buy AND cheaper than tape to maintain.
FWIW, the EMC Data Domain Archiver announced earlier this year does just that...
Want your archive off-site? Easy - bury the Archiver in a mountain, and replicate your archives to it...again, much safer than shipping tapes, and much faster, too!
RE: Power down those disk archives
"....why not just power down those drives that you've copied your archive to?...." Because powering drives on and off is more likley to cause them to fail than if they're spinning idly. You also have a nasty current surge when you power up a disk, and the more disks you have to power up means a bigger surge, which stresses other parts of the array such as PSUs and UPSs. A compromise would be to slow the disks, so they are drawing less current and generating less heat, and then ramp them up to opearting speed only when needed.
Not quite so
1. Any self-respecting disk enclosure does not spin-up the disks until they are accessed. Otherwise no power supply will be good enough. Even things like 70$ 5xsata from IcyDock do that.
2. Most SATA 3" drives are rated at least 10000 start/stops. 2" drives often have even higher ratings. So no real problem there.
3. Even if you have the disks spun down in an array they do not all start-up at once. Linux spins them up sequentially, so does windows because the spin-up causes a blocked read and only _ANOTHER_ disk access to _ANOTHER_ chunk of data on the same array will spin it up.
4. As far as archiving "lots of stuff" a 2-3TB per run the disk backend for an archive solution is trivial to implement using the shelf hardware, NIS, Autofs and automounted NFS at around 5-7W idle consumption. The trick is not to use RAID and to powerdown the server into suspend-to-RAM. A couple of linux servers sleeping in ACPI S3 waiting for WOL can easily do that. Just fit them with LOTS of disks (plenty of cases out there that can fit 12+ drives).
Going above 2TB _PER_ _RUN_ ... now that is a different story... Still doable though... You can still use a similar set-up with more head-ends and more back-ends.
RE: Not quite so
"Any self-respecting disk enclosure does not spin-up the disks until they are accessed....." Really? So all that disk-levelling that goes on in RAID sets, that just happens by magic? How about background cloning/mirroring? Or, if they are filesystem disks, what about integrity checking? Don't even try and think about what happens if your array does dedupe. Quite a lot happens on modern arrays in between host access calls.
"......Most SATA 3" drives are rated at least 10000 start/stops...." Yeah, you keep on reading the manual, I'd rather go by real experience. Any admin with half a clue will tell you that the MTBF goes out the window with the stress of starting and stopping systems. Just go look at this article, it explains the shock a company got relying on a cold failover solution:
Whilst TPM (predictably) tries to spin it as an hp-only problem, I've seen all types of kit fail during start-up (yes, TPM, even IBM's!), and well short of the advertised MTBF.
".....Linux spins them up sequentially, so does windows...." I think you'll find that's to stop current spikes, not some uber-clever OS mechanism. EMC used to tell us it saw a 2A current draw for each disk in a shelf that spun up from a stop. In an array that could have hundreds of disks, you really don't want all the disks kicking off or you will overload the PSUs or the circuit breaker. When we power up arrays in our datacenters we do a shelf at a time, we don't switch on a whole rack/cabinet at once.
"......A couple of linux servers sleeping in ACPI S3 waiting for WOL can easily do that...." Well, if you want a response time in the minutes that's fine, but even archive systems are expected to work a but faster nowadays.
Whatever medium you use...
...no matter its speed, capacity or ease of use...
...prepare for the day when it will no longer be accessible.
So I shouldn't have backed up my 1990's vintage pr0n collection onto those 20Mb SyQuest drives then?
Properly managed data will always be recoverable. If your old tape technology is going out of support, clone them onto your new tape tech (it's automatic, just tell your backup software what to clone/dupe) If your array is going out of support, migrate onto a new array, there are various companies that produce software that can even make this invisible to the servers the disk is presented to.
fair enough but I doubt the trifecta part..
Your points are fine but actually I think that disk will die because it satisfies neither requirement properly leaving flash and tape to do all the jobs.
What's the point of spinning disk if your entire infrastructure is built around the speed of flash?
You can't do anything productive with disk if it's 100 times too slow... so you just end up with more flash instead.
I appreciate that many people won't agree!
<flame suit on>
flash is aweinspreingly expensive per TB for primary storage of mass data, Tape has the speed issue (and I question the article's price-point for it). The reality is you use the flash for a cache for the subset of your data you are actually using, the "slow" hard drives for bulk-storage of near-line but not active data, and I suppose tape for offline backups.
I don't see drives dying in the enterprise for a while.
Agreed, I think that "slow-but-good-enough-and-cheaper" will win in 90% of cases. As an example of this, just look at the arrays available today - you have top-end arrays with massive controllers, loads of cache and bandwidth, but they are also very expensive. Then you have medium tier arrays, which have less cache and bandwidth, but still sell very well. If all we wanted was speed we'd all be buying the top-end controllers. In reality, we buy a mix, right down to NAS and DAS, to suit requirements and budgets. Whilst many arrays may get banks of SSDs to act as caching areas for peak demand, I suspect they will still be mainly disk.
Whence the SAN?
I've been wondering for a while: is the drive to the insane IOPS that Flash can theoretically deliver just shifting the bottleneck?
Are we going to have to go back to DAS to actually get the advantage of flash speed? Small to medium shops like ours have only reaped the benefits of being to to use significant amounts of shared storage relatively cheaply with the advent and rise of iSCSI. As in, not having to invest in the fibre infrastructure, while getting good and speedy access up to the limits of what the disks can deliver within their array.
At this rate it will be the disks that are bored while our network cables are glowing red hot... I suppose you have to go 10GBit Ether, then trunking it until 40GBit and 100GBit are available. But the investment for that is what makes your CIO very unhappy with you, as the messenger. :(
Why is everybody comparing 2TB HDDs from Amazon???
While off-the-shelf consumer grade HDDs may be usable small enterprises and provide a good price advantage, for larger enterprises (those with storage arrays) these drives do not cut it.
The cheapest 2TB SATA I can find is from HP's MSA60 Direct Attached Storage system at a retail price of $539. You can be sure those that connect to storage arrays (EMC Clariion, HDS AMS2000, HP EVAs) costs more than that! These drives normally come with NBD 3-5 years support (depending on vendor).
Mine's with the enterprise-level HDD, not the consumer-level HDD.
Agree with James
A relevant example is comparing a large TS3500 holding an appropriate number of LTO4-drives with the same capacity of SATA from mid-range disk systems(HDS AMS or EMC VNX).
Based on quotes the cost/TB for tape is less than 20% of the cost of disk. That is excluding any compression, the space savings with tape and the huge savings in power/cooling.
Of course this calculation does not hold for small setups because the initial cost of tape is quite high. But as the environment grows I have seen customers stuck in a corner with their disk-only backup regretting that they did not plan for a tape solution from day one. And the way to plan for that is choosing the right type of backup software. The type which lets you in an optimal way exploit the strengths of both disk and tape.
So we are moving from this: 1. Primary: fast disk. 2. Secondary: slower disk. 3. Auxilliary: tape
to this: 1. Primary: flash "disk" 2. Secondary: slower disk. 3. Auxilliary: tape
Not such a "revolution" then.
At work, I'm migrating from an old EMC DMX-3 to a VMAX at the moment, we're using fully automated storage tiering, it automatically migrates blocks between the different storage tiers so that blocks which are being hammered move onto SSDs and back down to SATA when not really needed. Here is what we have:
Tier 1: SSD (very small amount, 10%ish)
Tier 2: 10K RPM Fibre Channel (30%ish)
Tier 3: SATA (60%ish)
We're yet to finalise the exact amounts of disk and may add in some 15K FC, but it's a really nice system, albeit rather expensive.
Not so good conclusion there Chris....
As others have posted out your "Tape's cost/GB stored blows disk away" is way of the mark.
Appart from pure media cost which is getting close, you have to factor that you seldom get 100% utilisation from Tapes. Also disk being random access devices enable you to do some clever stuff to get utlisation way way up from tape. Tapes are not indistructable either. If you drop a tape , you are very likely to damage it, and guess what you will never know until you try to recover from it. Its easier to refresh disk based storage than tape. Its not just the tape drive you need to factor in the cost of tape but also the tape library and the effort in human intervention.
Bottom line though is there is a place for tape and certainly is the final piece in a DR strategy but dont sell it as if its a cheap option. For customers that have multiple interlinked sites and can get geographical redundancy , all disk systems are looking more attractive than D2D2T.
"Its easier to refresh disk based storage than tape. Its not just the tape drive you need to factor in the cost of tape but also the tape library and the effort in human intervention."
We have a tape library +10 years old that has gone through 4 generations of tape technology. It is simply a matter of replacing drives and tapes and let the backup software do the rest. Robotics and library slots works across generations.
Now try to do the same with your 10 year old disk system. Does your new 2 TB SAS drive fit right in there? No, it doesnt. Most of the time a mere generation change in disk systems means a forklift upgrade.
- Updated Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
- Elon Musk's LEAKY THRUSTER gas stalls Space Station supply run
- Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
- Mounties always get their man: Heartbleed 'hacker', 19, CUFFED
- Opportunity selfie: Martian winds have given the spunky ol' rover a spring cleaning