"Now these really would be big fat data tubs as EMC calls them, but best used for desktop systems rather than in RAIDed storage arrays, where the RAID rebuild time would be diabolically long."
That's utter BS!
You use RAID if you want to be sure that when a drive fails, you still have your data both safe and on-line. If you have 20Tb of such data, then a 5- or 6-drive RAID of 5Tb drives would be ideal. The only alternative is a larger number of smaller drives, also RAIDed.
Personally I'd call for RAID-6 not RAID-5, because the rebuild time will indeed be long, and the chance of another drive error during that rebuild time is therefore higher - perhaps 10x what it would be for 500Gb drives.
Apart from that, just let the system get on with it. Using Linux software RAID, access by users to the data on the array is prioritized over rebuilding, so performance isn't noticeably poor during a rebuild, and if the rebuild takes several days, so be it. (That's a RAID-6 rebuild so you can survive losing another drive during that window ... I might have trouble sleeping if it were RAID-5, imagining that the first drive failure might signal a comon-mode fault with the others).
With the alternative, using (say) 50 or 60 500Gb drives, you'll be replacing them maybe 10x more frequently, so the total time per annum in rebuild may well be much the same, and the human time spent drive-swapping 10x greater. True, with more spindles you may get more performance ... but in the applications I look after, performance isn't an issue. Our researchers just need many Tb of data on-line and safer than relying on single drives not to fail.