23 posts • joined Tuesday 29th April 2008 21:46 GMT
RAID and triple parity etc
A few thoughts from the history of RAID, plus on triple parity etc:
RAID 0 was used for speed by striping data across multiple disks. Whilst faster than writing to a single disk, this increased the risk of losing all your data proportionately to the number of disks used.
E.g. the chance of losing all your data due to a failed drive in a 4-drive RAID 0 configuration is four times that of a single drive.
So, in order to give some protection against an increased chance of losing all your data, mirroring (RAID 1) was combined with RAID 0 to give us RAID 0+1 or RAID 10. Now we had speed, plus another copy of the data on the mirror for redundancy, in case of drive failure/loss.
In the past, due to processors being fairly weak, RAID 5 and RAID 6 was slow when the parity calculations were done in software, so the only practical alternative was to put the RAID processing onto a separate RAID controller card (Host bus adapter, or HBA).
However, due to the problem of what happened when the power was lost between writing the data stripe and writing the parity data (called the 'RAID-5 write hole'), the solution/kludge was to add NVRAM so that upon power being restored, the RAID card could then complete the write operation for data/parity still not written. But NVRAM was expensive and so the 'I' (inexpensive)
in RAID was lost.
Roll on a few years and the power of processors had increased dramatically and so were mostly idle, low CPU utilization, so it started to become possible to do heavy RAID calculations in software, with the advantage of no card to buy, plus no data being held ransom to a proprietary hardware RAID controller (especially if the software is open source).
Also, disk capacities became bigger and bigger, but the error rates remained fairly constant. But due to the increased amount of data passing through these storage systems, the number of data errors occuring was starting to become a problem.
See CERN report: http://indico.cern.ch/getFile.py/access?contribId=3&sessionId=0&resId=1&materialId=paper&confId=13797
and an analysis: http://blogs.zdnet.com/storage/?p=191
"Based on the findings - a PB data storage statistically will have 2,500 corrupt files that you won't even know about - and that is with non-compressed files - the number goes up with compressed files."
ZFS was designed with many goals in mind such as how to deal with ever growing amounts of data, larger file sizes, increasing numbers of errors, latent data failures (scrubbing to fix these) etc.
ZFS employed 256-bit checksums per block to allow easy detection of errors in the data, and fixes the errors on-the-fly when the file is read, either as a result of a direct read access or via a scrub of the storage pool.
Due to other features like delayed, aggregated writes, ZFS can now block-up many writes into one big write operation so that I/O is much more efficient, and this feature, combined with much more powerful modern processors, now allows modern file systems like ZFS to make RAID 5/6/'7' calculations in software reasonably quickly.
So now we have the best of all worlds: high levels or redundancy & other protective measures, plus reasonable speed too.
Due to drives often being bought in one batch, when one drive fails there is often a short time to rebuild the failed drive before the next drive fails too, due to similar design/build/material characteristics.
So, as Adam Leventhal says, due to larger drive sizes, rebuild time is increasing, so it's better to have two further drives in reserve than just one to protect your data during this vulnerable time, hence triple parity. So triple parity is not such a bad idea at all, IMHO.
And there's another advantage with RAID-Z3 over the RAID 10 example above:
With the RAID 10 configuration above consisting of 4-drives in the RAID 0 set, plus a further 4-drives in the RAID 1 mirror, uses 8 drives to give
4 TB of usable capacity, but it is more fragile: when a drive fails in the RAID 0 set, if a second drive fails in the RAID 1 set you're toast.
In contrast, with an 8-drive ZFS RAID-Z3 configuration you can have ANY 3 drives fail before losing any data, and you have 5TB of usable capacity instead of the RAID 10 set's 4TB.
Hopefully I haven't made any typos or mistakes, but please feel free to correct any mistakes I may have made.
Re: How do folk back these up?
You're right that RAID is not the same as a backup, it just provides protection against data loss due to built-in redundancy.
In some advanced file system like Sun's ZFS or NetApp's offerings, there *is* protection against data being unintentionally corrupted or deleted by the user, application or OS: they are called snapshots. Files and directories referenced by snapshots cannot be deleted until the referring snapshot is deleted -- so there is your protection against loss. Also files/dirs referenced by snapshots cannot be modified -- these file systems employ a method called 'copy-on-write' which means that if a file referenced by a snapshot is modified, the file system creates a copy. If I remember correctly, the common blocks of the two files are not duplicated, to save space, but don't quote me on that :)
A traditional 2-drive mirror is fine until you need, in this case, 2.1 TB. Also traditional mirrors often don't repair files that can't be read -- the file system/RAID controller often just returns the data on the good half of the mirror. However, ZFS also repairs the faulty file on the bad side of the mirror, as indeed it does in any redundant setup: mirror, RAID-Z1 (like RAID 5), or RAID-Z2 (like RAID 6).
For backup/syncing of files that you mention, again, ZFS offers a way of doing this -- with one important difference and huge benefit: you will never lose any data that gets deleted if you use (1) automatic snapshotting (via cron every 30 minutes or whatever), and (2) use 'zfs send/receive' to send an initial full backup and subsequent incremental backups, in which ZFS detects diffs between snapshots and sends only the differences, to another storage pool, which might be on the same machine or a different machine on the LAN/WAN.
These links might be of further interest:
I'll get me coat -- I'm off to the pub... :)
Don't lose it
With this much (video) data, you most certainly don't want to risk losing it all !
Luckily, that's what ZFS is built to do -- protect it from loss:
Pity you had probs -- maybe try a newer version as your bugs are most likely fixed now.
Would love to rip apart your reply but have better stuff to do. (I took a look at all your previous trolling and attempts to get the last word in on every comments slanging match).
You stick with Linux RAID and I'll stick with ZFS -- that way everyone wins!
> Because you have more money than sense? Or is it just the Sunshiner blindfold leading you to such nonsense statements?
Well, I didn't use components out of a skip, if that's what you mean. Dual-core 64-bit processors are cheap now and so is 4GB RAM -- did you take a look recently?
If you're personally happy with Linux RAID, why not use it? Can you explain how it tries to match ZFS' end-to-end data integrity, scrubbing, detection and repairing of latent errors? Did you actually use ZFS? If so, explain in detail your usage of it and what you didn't like. If not, why do you hate it so much -- just because it's not your beloved Linux?
> As I have commented previously on your $1000+ build...
The $1000 box used decent components, more RAM than was necessary, and besides, much of the cost was the disks themselves. At the bottom of my hardware page, I gave tips to build something costing only around 300 euros, excluding disks. But I expect that magical 4 figure sum of 1000 euros will stick in your mind anyway... never mind.
> You have to use that powerful a build because you use Solaris with ZFS, anything less would grind to a halt under the load.
What are you talking about? 64-bit dual-core processor: $50, 4GB RAM: $100 or so (when I built). This is not big bucks. If you want me to find these components in a skip, like you boasted in a post here, just show me the right skip somewhere... :)
> There are a number of Linux NAS solutions with or without Linux RAID which will work on hardware costing a fraction of $1000.
Already answered above. And I don't want to use Linux as it doesn't have ZFS yet (except in slow FUSE version). If you still don't know the ZFS advantages take a look at the Sun ZFS site -- you might learn to love it one day -- really ;-)
> An even more sensible effort would be to use hardware RAID which has virtually zero impact on the CPU load, and gives access to any number of pre-approved and pre-tested PC or server solutions such as the Adaptec range of cards, without needing anything other than a driver added into the kernel.
CPU cycles are abundant and cheap these days, unless we're running on your skip-retrieved Pentium II from the '90s. Mostly the CPU cycles are idle so why not use some? Also, that way, you avoid using proprietary RAID cards which must surely gain your favour as it involves spending less too.
> Of course, Linux support for these is a lot wider than Slowaris since Linux has much greater market share and has been around a lot longer. Of course, for fanbois there are Mac-supported RAID cards anyway, so still no need to involve the pain of Slowaris x86.
Irrelevant, as ZFS uses software RAID.
> But let's leave the valid comparison of your toy with a Linux hobbyist NAS and instead look at your hilarious comparison with the Active Storage XRAID. The latter is a proper, rackable commercial solution with sixteen 1TB disk slots, with a warranty and support service, whereas your desktop-only toy has three disks of 750GB and comes with nothing more than your misplaced enthusiasm to back it up - you are waaaaaaaaay out of your league!
I did already point that out: 'much lesser system'. Now have you heard of the Sun Fire X4500 aka 'Thumper' -- that is a 48TB box which is 'real' system like the one in this article, only more 'real' -- a real man's server :) They do make smaller, cheaper systems too. But these systems I mention are protected by ZFS, and this 'proper, rackable commercial solution' here doesn't.
And you really need to do more research -- here ya go:
BTW, I don't own Sun stock or work for them, I just like good engineering.
Mine's the one with a state-of-the-art Pentium II and 128MB DIMM found in a nearby skip in the pocket :)
Orders are fantastic... then sell it quick...
...before Snow Leopard is released, because OS X 10.6 will have ZFS, and that's expected sometime 1Q09. But why wait, when ZFS is already available within Solaris / OpenSolaris?
This box says it uses 'Linux RAID kernel', which is clearly not ZFS, so I wouldn't trust it with anything valuable. If you know about ZFS you'll know why.
Here's a much lesser system I built, but that use ZFS, and it works with my Mac Pro:
Thanks, and yes you're right. I would also add that IMHO ZFS *is* much better than other currently available filesystems in the marketplace today (discounting things like NetApp which cost mucho $$$), as you get several huge advantages like:
1. no need for an expensive and, more to the point, proprietary RAID card, as ZFS uses software instead of hardware (spare CPU cycles are abundant and cheap these days) to control storage access.
2. transactions to guarantee consistent disk state -- you can only have either (1) the state before the file was written/updated OR (2) the state after the file was written/updated, and never an inconsistent state.
3. end-to-end data integrity: what was in ECC RAM is what is guaranteed to be written to, and read back from disk later -- old school RAID solutions don't give you that. This is the #1 selling point of ZFS -- you can easily build setups that won't lose data, and the more redundancy you build in, like double parity and hot spares, the more unlikely it becomes that you will ever experience data loss. And large capacity SATA disks are dirt cheap: 95 euros per TB now.
4. a one line command (zfs scrub pool_name) which will go through all the file systems in your ZFS storage pool and will automatically detect *and correct* any latent failures (caused by bit rot, or dropped bits)
5. admin is extremely simple, and sharing via NFS, SMB/CIFS, and Samba is a one line command on the ZFS server.
6. it can manage virtually infinite amounts of storage -- the only limits you'll encounter will be determined by your wallet.
I could go on but I wont :)
ZFS is the future...
...and SME's can get themselves a box with 4TB of RAID-Z2 storage (RAID 6 on steroids) running on OpenSolaris for around 1000 euros these days. The future of storage has never looked better, even in these credit crunchy days :)
Here's something similar I prepared earlier in the year, for home use, but is suitable for a small business too:
And if you were NetApp...
...and apparently feeling confident in your legal victory, why would you feel the need to register the legal case in a Texas court well-known to be sympathetic to patent trolls, despite both companies having their headquarters in Silicon Valley, California?
Mine's the one with a RAID-Z2 array and a hot spare in the pocket.
Here's one I prepared earlier...
I also had an attempt at this in June, and was a PITA to get working but was successful, finally.
However, 400 euros for a 4GB RAM Hackintosh is not too bad. The main problem I had was finding an ethernet driver that worked reliably -- had to use the 'cpus=1' hack as a workaround for the concurrency problem related to the ethernet driver. At least my USB worked (for digital photo transfer). The machine was in fact my ZFS fileserver's backup machine, repurposed for the duration of the experiment.
http://breden.org.uk/2008/05/18/mac-on-pc/ (brief description and hardware used / prices)
http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ (if ZFS interests you)
All this capacity...
...will require some serious file system like ZFS to ensure that data is not lost. Sufficient redundancy, block checksums, regular data scrubbing to detect and correct latent errors like bit rot etc. Luckily ZFS will do all of that, and it's already here, and it's free and open source too. Here's a gizmo I made earlier:
Nice, but it should use ZFS
The box looks nice, but I wouldn't trust proprietary RAID implementations like RAID 5.
No, I much prefer to have something like ZFS looking after my data -- here are a few great features of ZFS:
1. Simple administration
2. Ability to create large, redundant data storage pools with one command
3. Built-in data scrubbing to enable ZFS to self-heal ‘latent failures’ (bit rot etc)
4. Built-in 256-bit checksumming used for every block
5. For redundant data pools you choose from mirror, single-parity RAIDZ1 (a la RAID level 5) or double-parity RAIDZ2 (a la RAID level 6)
6. Transactional file system to guarantee consistent state of data even when catastrophic failures like power loss occurs
7. High availability: data scrubbing can occur without taking the storage offline — unlike ‘fsck’ in Linux
8. Is designed upon the assumption that disk hardware should never be trusted, so solid checksumming, transactions are used
9. Designed to use cheap, commodity SATA disks, not expensive SAS disks
10. RAIDZ1 can survive 1 drive failure, RAIDZ2 can survive 2 drive failures
11. Hot spares can be specified when the data pool is created, or added to the data pool later
12. Hot spares are used automatically if drive failure is detected
13. Data pools can be sent and received, to allow easy replication/migration of data when upgrading disks
14. Failed disks can be replaced and substituted with one command (if no hot spares are available)
15. Regular snapshots can be made to allow easy file system state rollbacks, or retention of deleted/changed files - they are cheap in storage and fast to perform (uses hard links)
16. ZFS data pools can be shared via NFS, Samba/CIFS and iSCSI
17. For super valuable data, you can create a ZFS filesystem within a data pool that creates multiple geographically distant copies of the data on the disk, known as ditto blocks: 2 or 3 copies instead of just one
18. Sun Solaris OS and ZFS are free and open source
You can expand your ZFS storage pools by adding additional vdevs: mirrors, RAIDZ1, RAIDZ2 etc.
What you can't currently do is expand an existing RAIDZ vdev, although I believe they plan to implement this at some point.
However, assuming you have sufficient alternative capacity available (add more disks and make a new pool temporarily, or in another pool on the network), you can always recreate your existing vdev with more disks, and therefore you achieve the goal of 'expanding' your vdev.
The alien because ZFS is out of this world :)
A Home Fileserver using ZFS
And one of the best things about OpenSolaris is ZFS -- the revolutionary file system that allows you to build virtually limitless data storage pools cheaply and simply, and gives end-to-end data integrity to guard against data loss:
Re: Re: ZFS should ensure these don't lose data
You're right -- I used Western Digitals.
I wouldn't dream of using Deathstars in a RAID system -- that would be certain suicide.
I had some IBM Deathstars / Hitachi drives go clickety-click and that was that. I refuse to buy HGST drives any more, after these previous bad experiences. I use Western Digitals now and so far they have run flawlessly.
ZFS should ensure these don't lose data
It will be great to be able to stick 5 of these 2.5" 1TB disks into a small quiet file server box in a ZFS RAIDZ2 formation (like RAID 6 on steroids). That should give great fault tolerance, so that any 2 disks can fail without losing any data. As they're small, spin at only 5400 RPM, they should be quiet, relatively vibration-free and run cool.
The ZFS file system will give end-to-end data integrity to ensure no data is lost.
The show must go on
It will be fascinating to see the final conclusion of this fracas between NetApp and Sun.
However, whatever the outcome, ZFS is superb, free, and is here to stay.
"BT maintains its statement that the advice it took ahead of the trials said they would be legal."
So why did BT have a problem telling their customers what they were doing?
Good to see some more info being published
When I tried to research this area a few months back, I relied on some Sun bloggers posts to help me learn enough about ZFS to set up a NAS, so it's good to see Sun publishing something like an "all in one" guide today.
My "from the trenches" write-up on this fascinating area can be perused here:
Mine's a Schlitz, Ashlee :)
Oh, where's the beer icon -- can I request a beer glass icon be added?
- Product Round-up Smartwatch face off: Pebble, MetaWatch and new hi-tech timepieces
- Geek's Guide to Britain The bunker at the end of the world - in Essex
- FLABBER-JASTED: It's 'jif', NOT '.gif', says man who should know
- If you've bought DRM'd film files from Acetrax, here's the bad news
- VIDEO Herschel Space Observatory spots galaxies merging