Re: Is the world ready for a 30TB Failure Domain?
@AC the issue with RAID isn't whole device failure it's with probability of recovery. Following one drive failure on RAID5 you'd be unable to recover the set if you get a single non recoverable error on another drive. You lose all data in this scenario and need to go to backups. Given current NRE rates, the chances are better that you'll lose everything on a 10TB volume than it recovering properly with no further read errors.
RAID 6 is less risky because it uses different striping patterns usually as well as having more parity. It's still true though that after two drive failures a single dead block can kill the whole volume. Current error rates give good odds to about 25TB for RAID 6, although why risk it?
RAID 10 will silently copy the error blocks, causing spreading corruption as more errors are cloned when disks fail. Yes I am overstating this issue, but it is an issue. The OS can often recover or fail at the file level though so RAID 10 will rarely lose a volume.
EC will fail the affected parts of data related to the error block, meaning you only need to recover one file from backup, not multi-TB file systems. Because the error would be flagged you could even identify said file and pro-actively repair the file system. Yes, the whole system would need to rebuild after a drive failure, but the probability is high that you'll get most of your data back.