10 posts • joined Friday 12th October 2012 14:06 GMT
Keep in mind that BlackBlaze uses those drives a heck of a lot more than the typical consumer applications where the drives spend all their time just track following and staying much cooler. Seriously, heat is the enemy of disk drives and for a typical consumer application consumer grade is fine. You want continuous operation you need something with a better thermal profile and those are enterprise drives.
Re: I have a 6 month old Nexus 4...
I have a Nexus 4 and just got a Nexus 5 for my wife. We're on T-Mobile USA so we're prime targets for this kind of thing: no subsidy plan, GSM, etc.
The Nexus 5 is smoother, thinner, and has noticeably more screen area. It's a damn nice phone. But I'm not upgrading until my Nexus 4 breaks. It's nice and fast, but it's not that much nicer or faster to make me want to drop $350 more on the phone. 4.4 (KitKat) hasn't won me over, either. It's not bad, but I don't like how Google has borged the messaging app into hangouts, for example. I suppose I'll get used to it, but fully entering The Google Collective with all the software tweaks is kind of unsettling. Still, KitKat is better than the software on my daughter's S4.
Re: Error correction isn't good enough nowadays.
Nope, you don't need to do that, or at least it's very, very rare to have problems with that.
What we do these days is that we can detect errors and weak sectors using various intermediate code output stages to estimate the SNR of the read (think SOVA systems and the like). If we detect a bad or weak readback sector while reading we map out the offending block and use a spare one in its place. It's completely transparent to the user and it keeps us from having to wear out NAND any more than is absolutely necessary. (Something similar is done for HDDs.) You have to have a complete failure before something like this causes a problem that's visible to the user.
But think about what this means to end users. it means that if you ever start getting bad sector warnings what's happened is that we've used up all our spares and that we can't safely remap bad sectors without OS level help. That means that your storage device is on its last legs and you'd best be getting anything valuable off the drive ASAP since the aging doesn't ever stop.
Re: Error correction isn't good enough nowadays.
Bwahahahaha! You don't know the *half* of it.
Let's take the example of a typical disk drive. In the bad old days, we had "peak detector" systems where bit densities were less than 1.0 CBD - 1 bit in a half pulse width. With enough work and some really simple coding you could reach about 1e-9 BER.
Then IBM came up with the idea of applying signal processing to the disk drive, introducing partial response/maximum likelihood systems (Viterbi detectors) where you started to get more than 1 bit in a pulse width and the raw BER off the disk started to drop. Now they're putting about 3 bits in a single bit width because they're putting in LDPC codes and their 6M+ gate decoders behind the PRML and the raw BER coming off the disk is typically around 1e-5, but with the coding behind them they're typically well below 1e-15.
You want scary? Look at MLC NAND flash drives. After a few hundred erasure cycles the raw BER of those things can be 1e-4 or worse. Why? Feature sizes are getting so small that leakage and wear (threshold voltage shifts, etc) are causing those ideal voltage levels to get pretty whacked out. It's getting bad enough that you're starting to see those massively complicated LDPC codes in flash drives, too. Those fancy codes are needed, as are wear leveling, compression, and all those other tricks to make NAND drives last as long as they do.
HDD systems typically fail from mechanical failures but the underlying data is maintained and you can usually get someone to haul the data off the platters for enough money. NAND flash systems, though, die a horrible death from aging and if you have a "crash" on one of those it's not likely that any amount of money will get your data off it because of all the massaging of the data we do to keep those drives alive.
Re: In the fantasy relm of Corporate __________
You're also forgetting that NAND will soon stop scaling. It simply can't scale past about 20nm since it has to be a planar process and the shrink keeps decreasing the number of electrons stored. In 22nm you're talking about trying to store 200 or so electrons over PVT and the half life of the cell storage is getting into the realm of months, much less considering the decreased lifetime wear. 20nm flash is _hard_ to get working well.
The technology of NAND just doesn't scale well. It's likely that other technologies will come to replace it, but they're not available yet. So predicting the end of "spinning rust" due to NAND just by past performance ignores the technology roadmap and physics. Spinning rust is losing its share of the market, but there's still some doubt about what can replace it.
Re: SSDs and HDDs both require backup...
"The same report concluded that development of even a single bad sector is a pretty good sign that the drive is getting ready to check out."
There's a reason for that. SSDs copied the HDD redundancy scheme. In both cases manufacturers keep a fair bit of "spare space": for SSDs that's unused pages and for HDDs it's spare tracks. When you hit a problem reading a sector where you have to try and read it more than once you map it to a spare sector and mark the old one as bad. At no point does the user know you've done that, it's all done under the covers seamlessly.
Now that you understand that you can see the "why" behind Google's result: by the time a user sees a sector failing the drive has run out of spares, which means that a pretty fair fraction of the drive area has failed for some reason. Those reasons are usually cascade failures (heat related wear in an SSD, TA contamination for HDDs, etc). It's your hint to go out and replace the drive folks.
Re: SSD failure and HDD failure are very different
As drives fill up you get a couple of different things going on. First, the wear leveling starts running out of blank pages and has to start going garbage collection to try and make more compact file systems. Second, the more fragmented the file system the more writes you have to do. Thirdly, your overprovisioning starts to run out and get less efficient. If you want the more detailed version look up write amplification on Wikipedia, it's a tolerable introduction to the problem.
Re: Not easily replaceable? Plus, longevity
Nice assumptions, but not real. Write amplification is a real problem, especially with a drive that's even somewhat packed. Even the best SSD controllers can't keep the write amplification below 1 at ~30% capacity. By the time you hit 80% capacity you're talking monstrous amplification factors for even relatively sequential writes.
Example: I write a 512 byte sector. In a HDD, I write the sector. Done. In an SSD I have to read/erase/write the whole page (~64K or more). That's not including any remapping that has to take place for moving the other sectors on the page.
Then there are problems with longevity (cells need to be refreshed periodically since flash really isn't a permanent storage mechanism and cell content degrade over time), etc. There's garbage collection, all that junk that has to go on in the background on an SSD that doesn't go on in an HDD.
All told, flash isn't a technology to make a long lived drive. It's fast, and it's useful in some applications, but you have to be even more paranoid about it failing and give it a lot more margin than you'd give a HDD.
SSD failure and HDD failure are very different
I design controllers for both SSDs and HDDs. Failure mechanisms are typically very different.
For SSDs what kills you is the NAND wearing out, and that's a big function of how much data you have on your SSD. The problem for SSDs is that sector oriented writes in HDDs are still 512-4K bytes, while SSDs require different sized writes that are typically much bigger, although the exact size depends on NAND configuration. Since SSDs require full page erase-write cycles that means that a lot of small writes can cause page wear far beyond what you'd expect since even with wear leveling controllers you'll be writing tons and tons of new pages if you're not careful.
That same wearing of writing small blocks causing big blocks to be written and wear out gets exponentially worse as your SSD fills up. While you can push an HDD to 80+% capacity without significant penalty (just usually seek time), pushing an SSD past 50% capacity causes the controller to have write factors well above 1.0 and your SSD will wear out significantly faster. This is a real issue in SSDs that use MLC NAND because of the lower lifetime.
I tend to agree with richard7 above: a smallish SSD for OS/apps backed by an HDD for data storage and redundancy is the right way to go. I hate trusting the Cloud as it's pitifully slow if you have a lot of data to recover, and flash tends to have to many catastrophic failure mechanisms that arise without warning. I've been doing this stuff since the start of the PC era and I've only had one HDD fail without warning, but I've seen lots of SSDs fail without warning.
Re: SSDs and HDDs both require backup...
No, toasty warm is actually pretty deadly to NAND. When you're storing 200 electrons per cell in a 125 C environment you're lucky to keep good data for a month or so in the latest NAND technologies. There's a reason we've got strong ECC schemes to make flash more reliable, and the next step up will be LDPC codes, which are coming soon.
- Apple's spamtastic iBeacon retail alerts launch with Frisco FAIL
- Submerged Navy submarine successfully launches drone from missile tubes
- Cache in the Attic El Reg's contraptions confessional no.2: Tablet PC, CRT screen and more
- Developer unleashes bowel-shaking KILLER APP for Google Glass
- Pix Astroboffins spot HOT, YOUNG GIANT where she doesn't belong