El Reg’s futorologists have looked into the data centre storage area and foresee the banishment of disk drive spindles starting within five years. Put bluntly, disks suck. They are slow, they fail, they suck up electricity and floorspace. CIOs of companies with performance-focussed data centres don’t want disks. They look at all …
The disks may go, but the blocks will remain
Strange how things stick around.
Ever since spinning storage came into being, it's been based on blocks of data. Blocks make up files and directories. Block sizes change, the error correction associated with them also changes, but the concept has been remarkably resilient for 50+ years.
Given that almost everything else in the computing world, including memory word size, has changed during that time, shouldn't there be more suitable formats for storing and retrieving data than a mechanism devised for technology over half a century old?
Re: The disks may go, but the blocks will remain
I would disagree on that. Blocks are a convenient way of arranging sections of data managably.
Thats why they are used in disks, tapes, Memory layouts, MMUs, Video displays and so on.
After all, (simplifying), a computer is basically a state machine which fetches data from memory, performs actions on it, then stores it back. These days it does it a lot faster, they can run multiple instructions at once, a lot of IO devices have their own computers inside, and they are immensely compact, but if you explained the basics of how a modern PC worked to someone designing a digital computer in the 1950's, they would recognise most of how it works.
Re: The disks may go, but the blocks will remain
Blocks are indeed a convenient abstraction, but inside some SSDs they're already getting de-duped and compressed, so there are still possibilities for shifting the division of responsibility between the filesystem and the hardware. TRIM, wear levelling, read disturbance tracking, a raft of alignment hacks to deal with FAT32/MBR legacy brain damage - up until it bit MS when 4K sector HDDs arrived, and they finally abandoned the stupid lie that every disk has 255 heads, 63 sectors and partitions simply must start on a cylinder boundary - all of these are symptoms of the mismatch between a block storage model that tries to cope with any write pattern to arbitrary 512-byte blocks, and the physical realities of easy-to-kill larger programming pages arranged within erase blocks.
Some of this stuff can be handled just by exposing some basic geometry - what alignments and write sizes make sense for the underlying flash, for instance - but a copy-on-write filesystem like ZFS or Btrfs but more specifically aimed at flash, which controlled programming/erase policy, could go around the standard block model. For instance, filesystem defragmentation preening to free up contiguous space on HDDs could turn into a way of freeing erase blocks, and wear levelling falls out as a consequence of the copy-on-write nature of the filesystem.
A machine I worked on, http://en.wikipedia.org/wiki/Vayu_(computer_cluster), had 1500 blade servers each with 24GB of SLC flash SSD, as developed by Sun. The SSD write bandwidths would drop considerably over time, even with aligned 4KB write workloads for scratch storage and swap; there was no TRIM or secure erase support on these SSDs, but we worked out that every month or so we could do a whole-of-SSD blat with large, aligned writes to return each SSD to near its original write speed. Granted, this speaks to the maturity of the SSD firmware that was delivered in 2009 with this machine, but it seems to me that better documentation of the SSD and a better understanding of how the filesystems were hitting that block device could have helped us avoid that performance degradation in the first place.
So, yeah, the block abstraction is a useful one, but it's not without its warts.
If I raise a request for this at work now, IT might have manage to get round to provisioning one by the time they're available....
Re: Forward Planning
Can you order a new keyboard for me while you are at it?
SSDs won't mean the death of HDDs any more than HDDs signalled the death of tape.
Re: Doubt it
...but tapes and disks aren't the same thing.
ssd's will replace rotating disks because they *are* the same thing.
Re: Doubt it
SSDs are the same as HDDs in the same way that HDDs are the same as tape. They all store blocks of data. You can write data to all of them. You can read back the data from all of them.
The only differences between any of them are the underlying technology and performance characteristics of each technology.
Re: Doubt it
Tapes differ from SSDs and HDDs in that tapes are sequential access, HDDs and SSDs are random-access.
To access a given block on a tape, I have to wind the tape to that position, passing every other block in-between.
HDDs, I have to move a head assembly to the right position, skipping many "cylinders" of blocks.
SSDs I just address a different cell.
Re: Doubt it
Before disks became affordable people used tapes like you use disks now. They did random IO, very slowly. Tapes also needed to keep stopping and starting, watch any old SciFi film with rows of tape decks in the background and all the weird stuff they did to reduce the inertia of the bit that was starting and stopping all the time.
There is one other significant difference between tapes and disks. It is easier to take tapes out of the drive and put them in the firesafe.
Re: Doubt it
This is just incorrect. Tapes, HDDs and SSDs all can seek to specific positions. In tapes, the seek is excruciating, in HDDs the seek is painful and in SSDs the seek is almost non-existent. Of the three, only SSDs could be described as random access.
This goes back to my original point, they are the same, the differences are just technical - transfer rate, seek time and so on. You might think that tape is dead, but for storing mass data that you read back sequentially, its the fucking dogs.
Even today, there are applications where tape is supremely better than HDD, and it is the height of foolishness to think that because SSDs are superior to HDD in terms of seek speed and transfer rate that there will be no application where HDDs are the superior solution (and therefore that HDDs are doooomed).
Re: Doubt it
>HDDs, I have to move a head assembly to the right position, skipping many "cylinders" of blocks.
You've just contradicted yourself there. Having to move the heads around makes them sequential. Yes, so it's not as slow as tapes, but it's still not completely random access.
If you want random access you need to be able to randomly access any data on the device in the same time, regardless of physical location.
HDDs aren't going to die. They have reached one limitation: rotational speed. However, as long as they continue to increase in capacity, and continue to fall in price, they will always be there.
CFOs don't choose the best technology. They choose the most cost-effective.
Re: Doubt it
That's just plain wrong. If I'm at, say, block, 100 on a disk and I need block 47881, I don't have to whizz past blocks 101, 102, 103... etc. to get to what I need - I just traverse to the destination track and wait for the required block to come round - there's nothing sequential about that and your argument about being able to read anywhere in the same timeframe isn't relevant.
Re: Doubt it
>>> That's just plain wrong. If I'm at, say, block, 100 on a disk and I need block 47881, I don't have to whizz past blocks 101, 102, 103... etc. to get to what I need - I just traverse to the destination track and wait for the required block to come round - <<<
You are still seeking past blocks on the right track.. That is sequentially accessing as you wait for the right block on the right track to come around. Never mind the blocks that you passed when your head had to seek from track A to track B.
Yes, disk is better than tape for random I/O but it does in the end access data sequentially. Otherwise your random throughput would be higher in comparison to sequential.
Even SSD is slower, though much less dramatically at Random I/O.
Re: Doubt it
Tapes are SERIAL devices, not sequential (although they can be that too).
Data on tape is processed serially not sequentially because it is ordered and retrieved according to time of writing and not some kind of logical identifier like an account number. Data can, of course, be pre-ordered, sorted, and then written to tape. That data will then be read both serially and sequentially.
Of course, SSDs will replace rotating memory. As John Wayne famously said, "in 12 or 15 years time".
all down to $/GB
Rotating hard disks will last as long as their advantage in price and density does.
It will be interesting to see what the traditional storage manufacturers do in response to cheaper, denser SSDs. Have they been holding anything back?
Re: all down to $/GB
the base price of a disk drive is about £50 (they can make them cheaper, but these sacrifice some longterm reliability). It's been that price since the days it bought you just a few Gbytes.
They have technologies that will permit 50Tb disk drives working in the labs. I expect they'll be on the market within a decade. I doubt that 50Tb of SSD will ever be competitive on price.
One other thing: are we sure that SSD really is more reliable? The technology has not been in use for very long. One thing we all know, an SSD is likely to fail "just like that" with no advance warning. HDDs frequently (though not always) give warning of pending failure and permit pre-emptive replacement. My instincts say wait a few years more before betting the farm on SSD storage. Also that memristor tech will supplant flash SSDs - true random access instead of large-block addressing is a huge plus.
Re: all down to $/GB
"Rotating hard disks will last as long as their advantage in price and density does."
There is no density advantage that I can see over flash, in fact the opposite? Certainly there's a big price differential, but that's factored into a equation that includes higher energy costs of disks (including cooling, not an insignificant cost), the unknown of flash longevity,and the speed advantages of flash.
Flash longevity is far better understood and managed than hitherto, so that's computable as a cost and risk, and probably no different to the risk and costs of HDD failures. Which means the key question is focusing ever more closely on whether the (declining) TCO advantage of disks can justify their sluggish performance. WIth flash costs falling faster than HDD, there will come a point where disks will still be cheaper than flash, but most buyers will prefer flash for the performance, and once that tipping point is reached HDD volumes will start to fall, causing HDD prices to increase. Anybody buying RAM for an older machine will already know what happens when production volumes fall.
Re: all down to $/GB
Absolutely correct, especially for consumers who need to store all their rich content. But, for the enterprise where the big $$ are I can easily see flash being used for 20 to 50% or even higher of most corporations data in five years. Might even be sooner.
HDS already has their own home grown FMD (Flash Module Drives) in 1.6TB, 3.2TB with 6.4TB coming next year and these are half the cost per TB of MLC SSD. Yes, 6.4TB in the same footprint as a standard HD and much faster than MLC.
Re: all down to $/GB
No reason why you can't have a flash RAID array for redundancy, and a little red/green that shows status, that can be hot-plugged just like HDD
Re: all down to $/GB
Always comes down to money.. funny how that works. The tiers are considerably cheaper as you go down on $/GB, if they weren't ... there would be no need for them. I've been in massive shops with VTL and there is always tape in there somewhere. Financially, you'd go broke with multi-PB and keeping all those backups on de-duped HDD. I don't know how many times I've argued, if a tier is a factor of 5 or more in cost cheaper, that tier will be around for quite some time. Especially with database archiving solutions. The stale data resides on a cheaper tier. Makes no sense to keep that on a more expensive tier.
Re: all down to $/GB
data recovery might have something to do with it too.
Recovery from HDDs is possible and relatively "easy". Recovery from SSDs is not so.
I've found SSD to be more unreliable that disk. Anyone have MTBF comparison figures for decent SSD vs decent disk?
I hear one flash and they're ash.
Not long-term figures. SSD tech has been developing extremely fast, disk tech less so, but in both cases by the time any device can be pronounced long-term reliable, it's also obsolete. The manufacturers do "accelerated ageing" tests but don't have a TARDIS. They have to substitute heat, humidity, vibration, extreme usage patterns ... and time axis scaling laws of very doubtful validity.
To put it bluntly, the fact that only 1/1000 of your test sample has failed after 12 months of torture testing, doesn't mean that 90% of them won't have failed after five realtime years of gentle usage. And in fact we see the unexpectedly bad ageing problem every time a manufacturer buys a bad batch of components. then there's a flood of "WD / Seagate / Hitachi are cr*p" messages, really meaning "model xxxxxxx with serial numbers between xxxxxxxx and xxxxyyyy are likely to fail prematurely because the ZZ corp supplied some bad widgets". It would be nice if the manufacturers put out recall notices like car manufacturers do, but of course for a £50 (or even £150) disk drive they cannot afford to do so.
"I've found SSD to be more unreliable that disk. Anyone have MTBF comparison figures for decent SSD vs decent disk?"
Or more importantly, real-life figures on failure curves, now that we know there is no direct correleation between MTBF and real-life failures.
Disks may use energy when you operate them, but SSDs use energy to make them. Lots of energy.
I used to be head of storage design for a very large UK FI, I asked EMC, NetApp, HDS, HP, IBM and a bunch of others if they could supply whole lifecycle energy used for their disks, no-one could get any figures. There is no point in saving energy (from a green point of view) in a datacentre if that energy is being used elsewhere.
One can deduce an upper limit on the energy input from the sale price of the chips and current industrial electricity costs.
A 10 Watt disk drive running 24 x 365 x 5 years uses 438 kw/h, at 10p/unit that's £43.80. Depending on size, an SSD may not cost a lot more than that, and it's the cost of the chips it contains you need to use for your upper energy-input limit, not the cost of the completed, tested, packaged and warrantied assembly.
Thing is, manufacturing energy costs are one-time whereas operating energy costs are continual as long as the drive is running, so there's always the likelihood the cumulative operating costs exceed the one-time manufacturing costs.
I agree, but you have to also factor in the costs of power in other countries. I daresay dirty coal power in China is going to cost a damn site less to a big company than cleaner coal/nuke/renewable in UK.
It is only likely that cumulative operating power will exceed, it is by no means certain, indeed there are many products we use from day to day which cost far more in terms of energy to make than will ever be used in their lifetime.
Take the humble plastic bag, if you were to replace it with a cotton bag you'd need to use the cotton bag a couple of hundred times (can't remember the number, but it's that ballpark) before you exceeded the amount of energy it cost to make the bag in terms of standard plastic bags. Likewise, a bag for life needs to be used about five times to be more efficient than a bog standard plastic bag, but people still use cotton bags and bags for life.
But I don't think they've built (gigabyte state-of-the-art) Flash fabs in China yet. The moment Intel or Samsung or whoever do that, they've given all their know-how to the Chinese. So it's the power cost in whatever country the Flash fab is in that you need to look up.
OK sitting here in TN electricity costs about $0.08/hr so let's compare....
Cost of disk electric $35.04. Cost of replace the disks in my raid 6 box , 8*1TB SSD 8*$600=$4800. (source of 1TB SSD a Samsung from NewEgg $599).
So until the price comes down, the domestic scene in the US is going to be largely disk...
PS to be honest SSD are amazing, but I get the feeling that people think physical resistance somehow out weighs the "sudden failure" mode of ALL technology...
Hence RAID6, even if I did have SSD...
Re: thought expt...
It's not that simple since we're talking the enterprise here. We're talking environments where they actually not have what would be called "stale" data. If you must have all the data all the time, then slower media costs you time, and time for a high-speed business like the Internet is literally worth money (think the difference between handling 10K transactions a second and doing 100K).
Re: thought expt...
Indeed. Too many people think about storage in terms of capacity. With most enterprise environments, speed is everything. Disk drives have always been the weakest link and even with flash they still are. It would be great to have a database running entirely in RAM, but it has to be stored somewhere, in case you lose power or something fails.
In the same vein, people are constantly saying, why not put some flash directly into the server? Great idea, until that server goes down and won't start back up again, for whatever reason.
So then you ask, how much does it cost, per second, for that application to be offline. And for many organisations that figure has a lot of zeroes on the end.
It all depends on your needs, but for those who have those critical applications, they need something as fast as possible, with at least one other copy of data somewhere else. The cost of the hardware is a drop in the ocean.
Today I can buy 4TB of spinning rust for £138, and 4 x 1TB SSDs for £2040.
Spinning rust likely has more potential for increased storage density than FLASH so I don't see that price ratio leaning much towards FLASH in the future.
Power, cooling and physical size favours SSD but not enough turn round a 15:1 price ratio. So I guess in 5 years time the choice or split between rust and FLASH will be based on how much you value performance, much the same as it is today.
I thought in 5 years time we were all supposed to be using memristors anyway?
Memristors have many advantages over flash, but the areal density advantage is not a large one. My money is on the nonvolatile RAM (memristor) over the large-block-addressed and limited-rewrite Flash, but I also expect that multi-Terabyte disk drives will be around for the forseeable future. A lot depends on whether the HDD manufacturers have built the bit-patterned media plants and the two-digit-Tb drives before SSD eats their bread and breakfast market. They might decide to stop investing in future bigger HDDs because the HDD business doesn't have a future (which would be a strongly self-fulfilling prophecy).
The price of N Terabytes of SSD will always be N times the price of one terabyte (until they can make a 1Tb nonvolatile storage chip, if ever). The price of one disk drive will be £50 plus whatever they can get for it being bigger than cheaper ones. If a 10Tb or a 50Tb drive is ever marketed, it's a fair bet that 5 years later it will be available for £50 of today's money.
Wafer-scale SSD integration might one day put a terminal spanner in the HD works, but wafer-scale integration is something that's been coming for almost as long as nuclear fusion, and like fusion we still don't have it.
I don't know about...
...all-flash datacenter in 5 years, but there is a good possibility of mostly-or-all-flash primary storage and greater use of in-host caching. I think you will see that become more and more common as flash costs come down and capacities go up. XtremIO primary plus Data Domain backed by Spectra Logic tape, for example. One for hot data, one for archive and hot backup, and one for long-term cold backup. It seems as though most vendors are developing similar strategies.
We are doing an RFP for a greenfield site and every single vendor is bringing some flash capabilities to the party (Dell, in fact, did go all-flash on primary storage). Workload-wise we don't need it but everyone is building it in to their platforms so when our workload patterns change in the future, which can be difficult to predict over 5 years, we will be positioned to deal with those changes and expand the use of flash as needed. Yeah, I'd love an all-flash storage array, but that's not always the most affordable thing for SME. Bring on the hybrid arrays with disk-to-tape backup, I say.
More than 5 years is hard to predict, a lot can change between now and then. There is already a lot of change in the storage industry right now and even worse, a lot of FUD in the marketplace. It's good times and sometimes the drama plays out like a soap opera, this is a great industry!
Re: I don't know about...
Disruptive innovation in the consumer space is accelerating this. Smartphones all use flash not disk drives. The competition in the consumer market is intense and drives prices down a lot more so than the disk drive competition in the enterprise which is now concentrated in 2 companies. As a result flash now in a position to replace disk drives. Another observation is smartphones do not seem too bothered about RAM speed. As a result are we going to be stuck with DDR4 for the next ten years and fast block IO. I wonder in 5 years memory will become the bottleneck and IO will be fine.
Another thought is we now Flash will replace Disk Drives (the latency benefit is massive for databases). The massive improvement in latency for Random IO is going to have a major impact on Databases. RDBMs will gain from improved Random IO with their ability to implement indexes dynamically. While I'm at it when are Database vendors and Intel going to get together to use all the silicon a bit better. Maybe this is something one of those UK fabless chip designers may do.
Re: I don't know about...
Thing is, the consumer sphere will still have a valid use for spinning rust: bulk storage of low-priority data (think music and movies for a media center or a home backup--tape is impractically priced for the home). Time is NOT of the essence here, but space IS. So hopefully WD and Seagate will keep the spinning rust going for a while longer at least.
With both the EMC and HDS announcements this week, its clear flash party has really started. Regardless of whether the other vendors are headed down a brand new track or just switching trains from disk, they need to hurry up or risk being unfashionable late with flash-based solutions! And lets not forget that cost is still an inhibitor, particularly when your looking at the systems that don't yet implement storage efficiency features like deduplication. Licensing technologies like Permabit Albireo is the only strategy that’s going work for many of these vendors, simply because it accelerates the delivery of these must-have features.”
Flash is not yet reliable enough so it is still going to be a majority rotaing HDs with any SSD only to improve hot zones.
Flash has hit the density wall already. The latest Sandisk offering didn't even go through a process shrink but simply increased die area and efficiency. Flash fabs are hideously expensive. Think Billion$ expensive, and their annual storage output is a pittance compared to the HDD business, and that's not going to change any time soon. 3D NAND? Seriously? Do you even realize what 3D NAND is? The bottleneck in any fab is tool throughput. Please explain how running the same wafer/die through the same photo/etch/plating tools for each layer somehow results in any net gain in output. Oh, you say you can use old fabs and older nodes. Great. You've made it slightly cheaper but still haven't solved the problem of being able to ship a sufficient number of exa/peta/zetabytes to replace the annual storage capacity produced by the HDD makers. To do that means building more fabs. LOTS more fabs, which means lots more $Billion$. And you're going to do that all so someone pays you $0.03/GB and bitches about how that's stealing food right out of the mouths of their children? Yeah, right.
Finally, you want to know why flash is never going to beat HDD's on cost? The answer is simple. It costs a lot of money when you have to pattern every single bit of your storage medium. HDD's have to pattern exactly one bit per surface: the head. That means that HDD's have less process content per surface and will always be cheaper unless you can suddenly figure out a way to get all those NAND cells to pattern themselves. Good luck with that.
Oh, and HDD's haven't used spinning "rust" for about 20 years now. Do try to keep up.
Flash may have hit the density wall, but hard drives are hitting the SPEED wall, and right now enterprises need SPEED more than anything. Internet commerce runs at breakneck speeds; if you don't keep up, you get passed. So enterprises with that need for speed CAN and WILL pay the premium for whatever flash is available (some are even willing to shell out bookoo bucks for SLC flash--think of THAT). The figure is that, for that outlay, they improve their transaction rate which raises their returns, allowing them to amortize the premium AND keep up with the competition.
And before you say, "prioritize your data," many business are in a situation where they don't (and perhaps CAN'T) have "stale" data that would be the candidates for offloading to hard drives. They need ALL the data ALL the time at a moment's notice (IOW, since you never know what your clients need, ALL the data becomes priority one).
Speed (bandwidth)? Or acceleration (latency)?
Guess Amazon Glacier has no reason for existing then. After all, who would ever want to wait more than a few seconds to get their data back, even if it then arrives at a decent *speed*.
If your Internet commerce business model really does involve never knowing what (large) pieces of data your clients will instantly need from anywhere in your single-tier all-flash storage setup, I hope that they're paying well for the service...
Flash has a place, but don't retire rotating rust just yet...
Exactly. However fast flash may be, and however suitable it is as storage, to bring more flash capacity online (at current densities), you first have to build a multi-billion dollar fab - and that fab has capacity limits that keep the cost of flash up, no matter what you do to try to bring it down.
Rotating rust (and yes, I know it hasn't been rust for a long time, but I used to be in the drive industry and that phase has fond memories for me), on the other hand, only takes a couple hundred million to incrementally add another line to an existing factory, and incrementally add capacity (and adding a new factory isn't that much more expensive). So the drive manufacturers can ramp up capacity much more incrementally than can the flash folk, and thus keep $/GB costs lower.
Add to this that data-at-rest (think Google, facebook, twitter) is the fastest growing storage category, and you see why hard drive manufacturers sell all the drives they make, and will continue to do so for many years to come. "Cloud" (you call it what you want, it's data that people want to keep but not necessarily access very often, aka write-once, read seldom) is where the market is. For these guys, capacity and density at the lowest possible cost are what matter, and there are new drive technologies coming online that will increase both incrementally (at little or no cost increase) over the next 10 years (helium filled, SMR HAMR, 3D, and so on). A 10GB single spindle drive isn't around the corner, but it is probably only 5 years away (never mind how long it'll take to format or fill one of these monsters). Flash won't be able to scale down in cost and up in capacity fast enough to deal with this market in the next 5 years (it may, ultimately, but that isn't the current trend).
For hot data (AKA enterprise data, on-line, etc.), flash is emphatically taking over (if it hasn't already done so, at least in the Request-For-Proposal space...). In fact, at least one drive manufacturer is no longer designing enterprise-type (2.5in, 10K+ RPM) drives any more, because flash is clearly going to own that market in the not-too-distant future. Building a 2.5 inch 10K+ RPM drive is lots harder (thus more expensive) than building a high-capacity 3.5 7200-or-less RPM drive, so from the drive manufacturer standpoint, 3.5 capacity drives are where to focus the effort.
From that enterprise storage (online storage) viewpoint, the article is mostly correct - but that market is NOT where the vast growth in the storage is these days. It's where a lot of the cash is going, but not where the majority of the bytes are going.
Re: Speed (bandwidth)? Or acceleration (latency)?
"If your Internet commerce business model really does involve never knowing what (large) pieces of data your clients will instantly need from anywhere in your single-tier all-flash storage setup, I hope that they're paying well for the service..."
As I recall, Google (one of those businesses who DOES have a "no stale data" issue) rolls their own.
Flash is already obsolute. R-RAM is the future....try to keep up..
When we can see a 256GB R-RAM module in a 2.5" form factor for no less than twice the cost of a comparable flash drive (or the equivalent on a PCI-E card), THEN I'll say it's probably the future.
Thing is, all these post-Flash techs have been "a stone's throw away" for years. At best, some of these have seen limited rollouts. Call us when one of them hits the mass market.
That's it then. Flash (aaah-ahhh) is not savior of the universe.