I'm one of those terrible people who "learn best by doing" and have always had a difficult time wrapping my head around exactly how high availability using "JBOD" external disk chassis systems was supposed to work. But my initial ignorance can work for both of us as we learn together. As luck would have it, AIC was interested in …
Nice one Trevor!
As just a home tinkerer (and a bit of a rusty one at that) it was a very interesting article about the sort of thing I will never afford and probably shouldn't be let near.
Re: Nice one Trevor!
It doesn't have to be that pricey, I picked up two Rackable enclosures for $100 each from ebay, each is 3U and has 16 hotswap bays. They are hooked up over mini SAS to a (Dell) LSI HBA, £70 on ebay. They sit in an Ikea LACK side table (£25), which is a very cheap 6U rack on castors. I also had to replace the PSUs and exhaust fans for silence.
Downsides are SATA-2, so I can "only" pull 1.2 GB/s out of each enclosure - does me fine, the disks I buy can't push that anyway and, as Trevor said, since this is for MAID, who gives a fuck - 300MB/s would be fine.
The other cheap option is a DIY with a Norco chassis, which isn't actually that cheap once you've bought chassis, backplane, psu and expander card.
60 disks in 4U!
I admit they are special racks (they are nearer 30" wide and goodness knows how deep), but in the IBM P7 775 supercomputer disk enclosures, you can get 384 2.5" disks in 4Us of vertical space.
On more mainstream systems, and having used dual-connected SAS drives for about the last 5 years, I will say that the biggest problem here is the repair of a failed expander card in the disk drawer. The problem is that although they are redundant, so the loss of a SAS expander does not stop the service, the repair action is not normally concurrent. This means that you have to take an outage in order to restore the full resilience, even if you have the connected to dual servers unless you have the data moved or mirrored to disk in another unit. The saving grace is that you can plan the outage, but you have to be careful if you are wanting very high availability.
I learned this the hard way when planning for service work in what had been delivered as a totally redundant system. A bit embarrassing when you end up having to stop all of the workload on a top 500 HPC system just to carry out the work for a single expander card (no, I was not responsible for the design, I only help run it, and it could have been mitigated with a bit more thought)
By the way, this dual connectivity is not a new thing. IBM's SSA disk subsystem also had dual connectivity for both disks and servers back in the mid 1990's. Very popular for HA/CMP configurations, and allowed for 48 disks in 4U of space.
Maybe this article should be filed under Instructables rather than Data Centre > Storage.
Check out the Windows Server Cluster HCL before building a frankencluster, clusters provide high availability & if it's not on the OS vendors HCL then you're on your own when it starts spluttering, say, after a OS patching session: no longer available.
One of the best infrastructure nightmare stories I know involved a frankencluster, no vendor in the mix could risk stepping into the fray & the users critical systems were down for over a week.
Scott Lowe's article, bad link
The link under 'Scott Lowe' doesn't work: I had to go to the page source and copy the underlying URL.
This may be because my browser doesnt know what to do with 'herf': should that be 'href'? :-)
Re: Scott Lowe's article, bad link
JBOD? Or high availability?
Forgive me author. JBOD and high availability systems are two different if related things as is sas vs sata. If you want a description of JBOD check out Wikipedia. If you were really wanting to talk high availability system, not a bad attempt but interesting you choose to talk about the 'rolls Royce' solution rather than all the steps along the way... Hay Ho!
This used to be easy with SCSI
You could have as many initiators as you want within the 8/16 available SCSI addresses. Typically the initiator was at 7, so you'd set the second SCSI card to use ID 6 and have two HBAs on the same SCSI chain without needing dual ported drives.
I was using JBODs like this shared between a pair of HPUX servers for lower end clustering needs (this was back in the 90s when SANs were fantastically expensive and we'd only use them for the most critical/large storage clusters)
Re: This used to be easy with SCSI
It's even easier with SAS - you don't need to worry about the SCSI ID and you don't need to worry about termination.
I have two servers plugged into two Promise JBOD's. mirror all data across the JBODS and voila, no single point of failure. You can even use multi-path to increase throughput.
(i'd put some ascii-art here if El Reg would let me select a monospace font).
For people too cheap to buy SAS disks there are even little multiplexors that let two initiators talk to the same SATA disk (but only two, so no multipath if you use SATA).
> individual "system" controls an entire rack's worth of disks. I can't find anyone doing that
AWS/S3 uses 96 disk enclosures in 4U. Glacier 3 enclosures per 'head'. I've seen other setups of 60, 70 (HP MDS600), or SGI's 84 drive enclosures connected 4-up to a single head. The redundancy is not computed within the unit. It is computed ACROSS units using software erasure-coding. So even if a full rack of disks blinks out, you're just peachy. Now if you lose too many racks that have the ~same set of objects on them (across datacenters even) such that you bust the N:K resiliency, then yes, LOTS of objects/files/customers can be effected.
I figured some of the bigger ones had to be doing it, but I couldn't find links to any of it. Thought that article about Facebook's cold storage came out after I wrote this...
I was wondering "but it's still a JBOD" until I read the particular implementation involving ZFS. Yes, a JBOD is useful if you're doing ZFS as the resiliency upon losing an HDD is managed by ZFS instead of a RAID controller card. Bonus points for having an array accessible by two servers!
- Review Reg man looks through a Glass, darkly: Google's toy ploy or killer tech specs?
- MEN WANTED to satisfy town full of yearning BRAZILIAN HOTNESS
- +Comment 'Stop dissing Google or quit': OK, I quit, says Code Club co-founder
- Nokia: Read our Maps, Samsung – we're HERE for the Gear
- Ofcom will not probe lesbian lizard snog in new Dr Who series