back to article Sick of storage vendors? Me too. Let's build the darn stuff ourselves

Any half-way competent storage administrator or systems administrator should be able to build a storage array themselves these days. It’s never really been easier and building yourself a dual-head filer that does block and network-attached storage should be a doddle for anyone with a bit of knowledge, a bit of time and some …

  1. Anonymous Coward
    Anonymous Coward

    Anyone can build something small

    Try building storage at scale with a distributed file system and then bet your business on it. Go ahead Martin, build it for your company and show us how it's done. Wait, let pop some popcorn.

    1. Anonymous Coward
      Anonymous Coward

      Re: Anyone can build something small

      The only reason to build anything these days in tech is so you can get acquiHIRED.

    2. Ian Michael Gumby Silver badge
      Boffin

      Re: Anyone can build something small

      I don't know why you posted Anon, because what you've said is spot on.

      I mean the author is a homebrew hobby-ist. Now there's nothing wrong with that. I've been a home brew type of guy since my youth and I still run a serious home network.

      But at scale... you definitely need some serious $$$ and talent which you get when you buy from a vendor.

      10GBe is soo last year. In a large scaled clustered build out, you need to be much faster. Faster still when you start to look at the use of flash outside of the SATA bottleneck and the emergence of ReRAM. (Though, don't hold your breath. )

      Moore's law may not actually be dead it just hit a flat spot... . Now its smaller, faster, denser using less energy and producing less heat.

      1. Solmyr ibn Wali Barad

        Re: Anyone can build something small

        But, but, iSCSI over 10 GbE ought to be enough for anyone?

        /troll.jpg/

      2. Matt Bryant Silver badge
        Meh

        Re: IMG Re: Anyone can build something small

        "....But at scale....." Fair point, but the problem (for the monolithic array manufacturers) is that there aren't enough "scale" customers to go round. It also comes back to the core point of ALL storage considerations - the end user actually wants access to and safe storage of their data and couldn't give a rat's arse if it is on block or file or disk or flash or over Ethernet or FC. Their primary concern is usually cost. If an old PC running Linux will deliver that data with "good enough" performance then the users will accept the old PC. The 10TB of data I used to need a pair of EMC arrays and a 1Gb FC SAN for can now be delivered faster and cheaper by a cluster of two bog-standard x64 servers running something like MS Windows Storage Server over 10Gb Ethernet, and I don't need to pay for an EMC admin either. And x64-based solutions nowadays can provide scale and performance to cover 90% of business solutions, with professional 24x7 support on offer as well. So, yes, scale is still important for the few, but not so important for the majority.

    3. Anonymous Coward
      Anonymous Coward

      Re: Anyone can build something small

      ... Also if you build your own .. Your storage vendors' SE's and sales Reps won't be taking you out for lunch anymore where they'll tell you about their recent kick-off event in Vegas and the Strippers.

      And while you (customer) chew on your Wagyu Steak you hope you too wll be taken to their overseas executive briefing center one day, where you can learn about your vendors vision - in ways that would just not be possible locally. So you buy their product hoping that one day it'll be your turn.

      You'll have to have some influence on the procurement decission making process or it'll never happen. If you work for government, you'll never go. They'll have to make it look like training...

      Oh - i forgort - we we're talking about Data Storage ?!

    4. John Sanders
      Holmes

      Re: Anyone can build something small

      Although at the SMB level it is perfectly possible.

      I have seen a couple of large GPFS deployments in action, and it can replace a lot of things. If only IBM wasn't that greedy.

    5. dikrek
      Boffin

      Re: Anyone can build something small

      Folks, it's all doable, just remember that something as seemingly simple as automatically managed online drive firmware updates can be of paramount importance.

      Especially in this age of SSDs, drive firmware is updated rapidly - lots of corruption issues are resolved.

      Not being aware of the issues is one problem. Not being able to update the firmware live is a different problem.

      Find the release notes for firmware updates for some popular SSDs out there and you'll quickly see what I mean.

      Thx

      D

  2. Paul Crawford Silver badge

    Two reasons for buying

    1) It gives you someone else to blame for any TITSUP events

    2) You (naively) thought you would get professional support with it

    So it kind of comes down to scale, budget and belief in yourself.

    1. naive

      Re: Two reasons for buying

      The FUD/blame reasoning will always seem to be true for everything one does self instead of outsourcing it.

      On the other hand, built three Super Micro Linux storage servers using big tower case, 3PAR controllers and WD 500GB disks in 2006 for use with AVID video software. Two of them are still in use. Most common issues are fan failures and 2 disks had to be replaced. Loss of data never occurred.

      When building such servers, do not use brand systems like HP/Dell etc, but build one self using high quality parts from Asus, MSI or SuperMicro and a full size tower case. SuperMicro offers professional grade service on its products.

      Big tower PC cases offer lots of space to place disks in 3.5inch cages.

      Everything like disk cages and SATA cables can be bought.

      Also a separate UPS should be included, with software to automatically shutdown the storage server in case of a power failure.

      1. Justin Clift

        Re: Two reasons for buying

        Btw, it's also completely possible to make your own enclosures/cases if you're handy with power tools.

        If you want to go full overkill, you could even do something like this:

        Russian Wall-E Case

        1. Nigel 11

          Re: Two reasons for buying

          Or if you want a half-Petabyte filestore on a budget, you can just head over to those friendly folks at Backblaze, who built a business on build-it-yourself hardware and even tell the rest of us how to buy one and where to get the parts from.

          https://www.backblaze.com/blog/open-source-data-storage-server/

      2. Mark Hahn

        Re: Two reasons for buying

        Buying COTS like Supermicro is a good idea, since it means you can replace/upgrade parts more easily (standard PSU, standard boards, etc). However, this post seems to be advocating bigger chassis being better: that's just not true. You want to move air past your devices and out of the case: bigger is not better. (It's also true that disks still don't dissipate much heat compared to CPUs.)

  3. Anonymous Coward
    Anonymous Coward

    It could present LUNs via FC/iSCSI and file-share via SMB and NFS.

    Think ya used enough acronyms there, Butch?

    1. Justin Clift

      Re: It could present LUNs via FC/iSCSI and file-share via SMB and NFS.

      Those specific acronyms are the very basics for anyone doing storage professionally, or even in half-anger at home. They're not off target for the audience. ;)

      1. venneford

        Re: It could present LUNs via FC/iSCSI and file-share via SMB and NFS.

        Honestly, it would have far more jarring to have not used acronyms there..

  4. Destroy All Monsters Silver badge

    Hold on... did you just get released from Salesforce?

    Any half-way competent storage administrator or systems administrator should be able to build a storage array themselves these days.

    It will be down fast though. And then you will find out the management interface is missing, the documentation is missing, the disks have unspecified interface trouble, or the power supply mysteriously doesn't work and there is no maintenance contract. Aieeee!

    1. Paul Crawford Silver badge
      Trollface

      Re: Hold on... did you just get released from Salesforce?

      Of course you could buy a Oracle storage appliance and pay for a system where the management interface is buggy and locks up during problems (not fixed over 5 years of support), where the documentation is incomplete (and then they move/withdraw Sun blogs that answered some of this), the disks have interface problems (oh dear, yes the SATA ones are like that, no fix provided) and the power supplies and other hardware show phantom faults that are, once again, never really explained or fixed.

      1. Anonymous Coward
        Anonymous Coward

        Re: Hold on... did you just get released from Salesforce?

        And curiously enough, those Sun/Oracle appliances were built by people who thought, just like the author, that anyone could throw together a storage box if they had a good OS and a file system.

        1. Paul Crawford Silver badge

          Re: Hold on... did you just get released from Salesforce?

          They had all the bits to make a great and reasonably priced system, but pulled defeat from the jaws of victory by shipping a prototype version and then (largely by the Oracle take-over) losing key staff and failing to invest enough in to fixing it, instead of adding tick-box features that the sales folk were asking for.

          Now of course Oracle has no interest in the lower priced end of the market, or even of selling storage as an item instead of part of a large profitable database deal. Others have stepped in with the same idea of a ZFS based appliance, but have any of them really sorted out the management and recovery aspects to make it reliable and painless to use?

          Also we are seeing longer and longer rebuild times on bigger and bigger HDD, which are still your best bet for GB/£, and ZFS has not got anything like the Dell "data pools" where in effect your RAID strips are randomly spread over disks in a much bigger pool. Then a failed HDD results in a parallel rebuild of all affected RAID stripes to other HDD and you don't have the single spare/replacement HDD bottleneck in write speed versus capacity.

          1. Rainer

            Re: Hold on... did you just get released from Salesforce?

            > Also we are seeing longer and longer rebuild times on bigger and bigger HDD,

            Ah yes. There's a point.

            Though, is that still a problem when you do RAIDZ2?

            I usually only do 6-disk RAID Z2. I've yet to see a failure in the arrays with 6TB disks...

            1. Paul Crawford Silver badge

              Re: RAIDZ2

              Like RAID-6 it gives you an extra degree of redundancy during a rebuild. And for all of you out there who have seen RAID-5 rebuilds cough blood on sector errors only found during the rebuild and with no parity remaining to correct them, that is vital.

              But if you are looking at a week rebuild time on a 8TB disk under real-life conditions, you still have an uneasy window for something else to go wrong.

      2. Anonymous Coward
        Anonymous Coward

        Re: Hold on... did you just get released from Salesforce?

        That's because all the money went into patent law suits instead of engineering....

      3. Anonymous Coward
        Anonymous Coward

        Re: Hold on... did you just get released from Salesforce?

        Into the last few hours of production for our Sun 7310. Been a PITA since we bought it and I for one will be glad to pull it out of the data centre this week.

  5. Nate Amsden

    what he said

    I used openfiler a decade ago with some spare HP JBODs for some dev workloads. As long as it worked there was no issue. Forget about upgrading though(I recall the uograde path at the time was basically full data migration to avoid loss).

    I tried nexenta a few years ago as JUST a NFS solution for small file set (under 1TB). On paper seemed ok. In reality it sucked hard, made worse by non existent support (yes i paid for their support and professional services to certify the configuration). I have since heard nasty insider stories about ZFS solutions in general which make me glad I have never considered that as a viable solution. I use ZFS at home and it works fine. And I'm sure it has it's use cases at larger scale as well. Keep it away from my vmware and mysql databses though.

    Not even solutions like pure storage are mature enough for my liking.

    The more i have learned about storage over the past decade that i have been using it more closely, the more conservative I have become in deploying solutions.

    I just need it to work without fuss, and for me and my org's mission critical data that means 3par on fibre channel(3par customer for 10 years now ). My experience with 3par is certainly not flawless by any stretch(no solution is perfect). That just reinforces not being interested in taking risks with any other block storage system. My feature usage of 3par is quite limited which probably means I encounter fewer bugs). I love the core of the platform that is very very solid.

    Now if HP only had a decent NFS offering (storeEasy and storeAll don't count). 3PAR NFS is a combination of not mature enough and requires special controller versions that only 1 of my 4 arrays happens to have. Even if all my arrays had them I don't believe it would do the job for what I want.

    If i was at a larger org we would have more flexibility in testing other things. As-is every piece of storage and server and networking is mission critical(all workloads whether development or qa or testing or production are all consolidated). I have no lower tier of stuff. Maybe at some point but not yet.

    For a while we were pumping 200 million in revenue through 8 DL385G7s (384GB with vmware) and a single small 3par array. Today we are bigger for sure. Not big enough to justify segmented servers or storage though.

    1. Nate Amsden

      Re: what he said

      Can't edit posts on mobile. But wanted to clarify position on ZFS. I think it is a good file system but don't believe it makes a foundation for a good storage platform (e.g. 3par replacement)

      1. Justin Clift

        Re: what he said

        I've only just recently started using ZFS via FreeBSD, but haven't really done much with it. What were the major scary bits with it for you? dedupe related - I've read of very bad experiences when it runs out of ram - or other stuff?

        1. Paul Crawford Silver badge

          Re: ZFS scary bits

          1) Don't use de-dupe unless you have absolutely masses of RAM and something like multiple VMs that share a lot in common.

          2) Fail over - just don't go there.

          So far we have used the Oracle fail-over feature that sucked donkey balls big time. Others have said of other fail-over software that it causes as much down-time as it is supposed to solve. Stopping the "split brain" risk is very hard to do.

          You might be better served by having a small separate arbiter (like a Raspberry Pi, etc) who's sole job it to spot an unusable system and power it down (ILOM command, or network controlled power strip) and bring up the 2nd head. Syncing the 2nd head status is another area of pain, again maybe best of the arbiter acts to configure both machines on boot from a central configuration. Yes, you just got a difficult job to implement and form your own start-up...

          1. Nigel 11

            Re: ZFS scary bits

            Fail over - just don't go there.

            It has always puzzled me that Digital (VAX/VMS) solved this so long ago with the VAXCluster and yet it seems to have been a festering sore for everyone else ever since.

            VMSClustering was (and still is) actually far more advanced than mere failover. And the source code was (briefly) there on microfiche for anyone to learn the finer points of the technology, just before the lawyers and corporate types moved in and Digital entered its long slide into oblivion. Copyright, not open source, but even so ....

            Sigh.

            1. Justin Clift

              Re: ZFS scary bits

              VMS is currently being ported to x64 though, which could be good news:

              State of the Port - March 2016 (note - PDF. Not a booby trapped one though. I really hope :>)

    2. John Sanders
      Linux

      Re: what he said

      Best NFS head ever is a Linux server (either physical or virtual) where the storage is presented from a LUN on a SAN.

      Not because a Linux box is not going to give you a headache every now and then (Linux diehard here) it is because the Linux box gives you flexibility and troubleshooting ability beyond any proprietary solution.

      For the file-system you can mix and match.

  6. Platypus

    Looks like Dunning/Kruger to me

    As with many things, the first level is easy but then things get much harder. Can I build a simple database? Sure I can. Can I build a fully SQL-compliant database with a sophisticated query planner and good benchmark numbers? Not without some help. Can I build an interpreter for a simple language? No problem. Can I build a 99.9% gcc-compatible compiler that spits out correct high-performing code for dozens of CPU architectures? Um, no. Similarly, building a very simple storage system is within reach for a lot of people and is a great learning exercise. Then you add replication/failover, try to make it perform decently, test against a realistic variety of hardware and failure conditions, make the whole thing maintainable by someone besides yourself . . . this is still a simple system, no laundry list of features to match (let alone differentiate from) competitors, but it's a lot harder than a "one time slowly along the happy path" hobby project.

    I'm not saying that the storage vendors deserve every dollar they charge. I'm pretty involved with changing those economics, because the EMCs and the NetApps of the world have been gouging too much for too long. What I'm saying is that "build it yourself" is a bit of an illusion except at the very smallest of scales and most modest of expectations. "Build it with others" is a better answer. Everyone contributes, everyone gets to benefit. If you really want to help speed those dinosaurs toward their extinction, there are any number of open-source projects that are already engaged in doing just that and could benefit from your help.

  7. MityDK

    Nice troll piece fishing for outrage from storage guys. Ridiculous.

  8. theOtherJT

    Well, I agree in theory but...

    Assuming you're using Linux at some point you will come across those words that make even the most hardened systems/storage admin tremble.

    nfs-kernel-server.

    I swear to fucking god they designed that thing just to mess with us. Anyone recognise this?

    You need to delete this directory because $USER doesn't work here any more.

    # umount /pool/home/$USER

    umount: /pool/home/$USER: device is busy.

    (In some cases useful info about processes that use

    the device is found by lsof(8) or fuser(1))

    cannot unmount '/pool/home/$USER': umount failed

    Wait... mounted? No they bloody don't. $USER has had their account deactivated. They've left the building. Their machine has been returned to the pool. No one other than the IT team could have mounted their share anyway - what gives?

    Oh, wait, they turned the machine off at the wall didn't they. Well, now we're screwed, becuase nfs-fucking-kernel-server is going to sit there and await an unmount from the client and the only way to stop it is to restart the daemon - kicking everyone who actually IS using it.

    (Honestly tho - if anyone has any ideas how the HELL you kick a user from a Linux NFS server in order to cause the kernel to release its lock on an exported filesystem, this is a bane-of-my-life type problem)

    1. Anonymous Coward
      Anonymous Coward

      Re: Well, I agree in theory but...

      nfs-kernel-server

      Yeah, don't do that.

    2. Paul Crawford Silver badge

      Re: Well, I agree in theory but...

      I guess you have tried umount -f already?

      1. theOtherJT

        Re: Well, I agree in theory but...

        I have tried bloody everything :/

        It's a well defined problem at least. The nfs-kernel-server process owns the lock on the file. Since nfs-kernel-server isn't kind enough to proved a human parsable entry in /proc to let you know which instance of nfsd is actually holding a given file (and since each nfsd process doesn't map 1:1 to a particular nfs export that wouldn't help even if they did) there's no way to know which instance you can kill to get your lock back.

        The only thing to do is shut down all of nfs-kernel-server and kick the 99% of your users who aren't causing problems.

        Possibly we'd be better served with a user space NFS server, but they all seem to have their own problems.

        1. Mikel

          Re: Well, I agree in theory but...

          > Since nfs-kernel-server isn't kind enough to proved a human parsable entry in /proc to let you know which instance of nfsd is actually holding a given file (and since each nfsd process doesn't map 1:1 to a particular nfs export that wouldn't help even if they did) there's no way to know which instance you can kill to get your lock back.

          This sounds like something you could fix right quick, if you had the source code.

          1. Alan Brown Silver badge

            Re: Well, I agree in theory but...

            "This sounds like something you could fix right quick, if you had the source code."

            You can get this information out via appropriate systemtap calls, but it would be a better solution to move NFS serving back into userspace where it belongs.

        2. Alan Brown Silver badge

          Re: Well, I agree in theory but...

          "Possibly we'd be better served with a user space NFS server, but they all seem to have their own problems."

          As one of the miscreants partially responsible for the nfs-kernel-server clusterfuck, I agree with the first and second parts of that statement - and it's not helped by the userspace server not having had any substantial work since 1996.

          The original userspace nfs server was - to be blunt - a piece of utterly slow shit. That's why nfs ended up in the kernel.

          The other part about it being in kenrel space that you missed is that IT WILL NOT PLAY NICE with _anything_ else accessing the same disk blocks. If you NFS export a filesystem, then the _only_ access to it had better be via that NFS export or you risk trashing the data.

          Putting nfs into the kernel more than 20 years ago was a solution to a problem (painfully slow exports and PCNFS being almost unusably slow) at a time when the people implementing it hadn't even thought of the possibility of something accessing XYZ file via NFS at the same time as something else doing it via SAMBA or something doing it at local level. If we had, then perhaps we'd have been more careful.

    3. Anonymous Coward
      Anonymous Coward

      Re: Well, I agree in theory but...

      umount -l or lsof and kill of the nfsd holding it should do it or go for editing /etc/exports and running nfs-kernel-server reload.

  9. JEF_UK

    No one said it was easy but...

    ZFS depends on how you arrange the disks and what your use case is. A single process writing tiny files to a good disk subsystem with good amounts of RAM and a sensible application of compression(yes/no) or de-dupe(yes/no) will suck.

    Give it a different task with multiple processes and large reads/Writes and it can shine as It can then leverage all the spindles and break down the writes in to segments and span them.

    Its too easy to think "I'll add de-dupe, compression and an L2ARC to make it faster" when in reality you don't have the RAM to store the de-dupe or the meta data. That results in limiting the RAM to not caching but to holding the map for the SSD/de-dupe.

    Re article:

    About 3 years ago I built a Debian+ZFS+SCST SAN and export LUNS over fibre channel to my VM host and desktop and iSCSI for my living room PVR. All for home.

    I've considered a few HA versions of it for it's replacement.

    I would need to set-up replication of the files system below SCST and be able to "shoot the other node in the head" I could use CEPH for the replication between nodes with direct infini-band connections.

    Then one node would be the primary and one a slave. Using NPIV on the switch to hide this from the clients.

    At home I would probably not to duplicate all my disks so would use a shelf with two controllers connected to both fie system heads and import with the F (force) command if a node when down.

    As for backup I have another HP micro server with big disks that runs Bacula but to backup the data on my VMs not the SAN.

    To do this commercial ask your self.

    1. Am I trying to save money?

    ____To do this well will require good kit and more than one.

    2. How long can a recovery of a file system node take/ what is my down time limit.

    ____Build your solution around this time limit. 0 down time can be done but only with sufficient replicas. Have spares. Use good resilient hardware (dual PSUs hot swap fans) Keep spares. Have a care agreement. That all will impact 1.

    1. Justin Clift

      Re: No one said it was easy but...

      As a thought, instead of doing the replication below the SCST layer, how about exporting the raw LUNS from each of the storage servers to the VM host, then doing the mirroring there? That should maintain the full io / transfer rate of things, instead of being (potentially) slowed down by storage server side replication.

      1. JEF_UK

        Re: No one said it was easy but...

        So the VM host process can write to two SANS simultaneously? That would be a cool feature and simplify things.

        A quick google finds this for VMware:

        https://www.vmware.com/pdf/esx_san_cfg_technote.pdf

        page 13

        "Mirroring

        Protection against LUN failure allows applications to survive storage access faults. Mirroring can accomplish that protection. Mirroring designates a second non‐addressable LUN that captures all write operations to the primary LUN. Mirroring provides fault tolerance at the LUN level. LUN mirroring can be implemented at the server, SAN switch, or storage array level."

        Everyday is a school day

        1. batfastad

          Re: No one said it was easy but...

          Is the mirror driver actually available as a thing to use for real-time SAN mirroring now? Been a while since I was a VMwarrior. This tells me that it was used internally for svMotion... http://www.yellow-bricks.com/2011/07/14/vsphere-5-0-storage-vmotion-and-the-mirror-driver/

          1. Justin Clift

            Re: No one said it was easy but...

            Unsure about VMware, as I haven't personally used it in ages. Other host platforms (eg Linux + KVM) definitely do this, as Linux provides the mirroring natively. (you just need to configure it)

            Haven't yet tried this with FreeBSD, but it would be kind of surprising if it didn't work.

            1. Justin Clift

              Re: No one said it was easy but...

              And just to point out, if VMware itself doesn't let you mirror directly on the host, you could pass the LUNs through to the VM itself to do the mirroring there.

              The thought makes me kind of nervous around failure scenarios thought. Would do decent testing. ;)

    2. John Sanders
      Pint

      Re: No one said it was easy but...

      I do the same you do at home, but small scale.

      With a single £50 ITX motherboard, single ATX motherboard, 16GB RAM and Debian.

      4 x 4TB Drives, 2 RAID10 using MDADM, LVM ext4 and xfs volumes.

      I have done horrible things to the set-up in the quest for science, while the server run every single conceivable service under the sun, including VMs.

      It is not very fast, but it is not slow either, I know companies with older kit that have much more problems and way less functionality.

      It is really good for the price.

  10. Infernoz Bronze badge
    Holmes

    Use FreeNAS or TrueNAS (pro. version), and decent hardware.

    Hardware like:

    * RAID or enterprise grade hard disks

    * A server grade 64-bit motherboard supporting ECC RAM (e.g. Asrock, Supermicro), some cheap mini-ATX ones even have SAS on board!

    * An Intel CPU supporting ECC RAM

    * Lots of ECC RAM, never ever non-ECC unless you like doing ZFS read-only recovery, been there!

    * At least 6 disk RAID arrays i.e. ZRAID2.

    * Possibly some SSDs for ZFS read and/or write buffering.

    The tiny OS runs off Flash Sticks (supports mirrored flash sticks), supported OpenZFS properly for ages (unlike Linux), is dead easy to set-up, has a web interface, ZFS makes lots of stuff easier, needs no messing around like Linux, and gets frequent updates.

    FreeNAS 10 sound like it will be even easier.

    1. Sixtysix

      Re: Use FreeNAS or TrueNAS (pro. version), and decent hardware.

      I'm looking at replacing my storage solution at the moment...

      Currently using on 6year old SAN (big ticket) and pro-grade NAS for a total of about 60TB, but would be very interested in talking with ANYONE who is using FreeNAS / TrueNAS or similar in UK... I can't find examples of critical front line deployment on this side of the pond outside Universities.

      I believe this type of FOSS based solution has potential, but can't procure if I can'f find support/a way to demonstrate capability...

      Sigh - I can see a big name "All-flash" vendor in my near future.

      1. Rainer

        Re: Use FreeNAS or TrueNAS (pro. version), and decent hardware.

        > Sigh - I can see a big name "All-flash" vendor in my near future.

        If you have the money - by all means, go EMC.

        Some of their stuff (Isilon) is actually FreeBSD inside...

  11. philipclark

    I think you answered your own question Martin...

    <i> "I built a block-storage array using an old PC, a couple of HBAs and Linux about five years ago; it was an interesting little project..." </i> – for home use

    I notice you didn't say that your whole company depends on this homegrown array.

    You're absolutely right, with a little time and research, you can build anything. But what percentage of a company's development budget goes into building a storage array (the fun exciting part) vs. testing, documenting, fixing, supporting, upgrading and generally "doing the important stuff" ?

  12. Anonymous Coward
    Anonymous Coward

    "Then you add replication/failover, try to make it perform decently, test against a realistic variety of hardware and failure conditions, make the whole thing maintainable by someone besides yourself "

    As one who is in the storage industry but not a storage manufacturer, it is the old 80:20 rule on product development. You get 80% functionality in 20% of the development time but that last 20% takes 80% of the time. And that 20% - it's all about what happens when there is an error - that is the really tuff stuff to do as every server / HBA / storage device behaves differently. Standards! don't make me laugh! Everyone has their own interpretation of a standard and the bigger they are, the worst they are. One SE once told me after I pointed out their non-conformance, "This is HP SCSI we don't care". The customer had the best answer I have heard in this situation " Look! I have the money and you have the problem" In a flash the salesman butted in "It will be fixed shortly" - this was a very big deal.

  13. Anonymous Coward
    Anonymous Coward

    What's your time worth?

    I have worked in pre-sales for NetApp, EMC and Pure. The price you pay is what the rep can sell it to you for. Chances are that even at 85% discount it's at a 90% margin; the kit is cheap it's the software reliability you pay for. It's frightening to see the difference in prices that customers pay for the same thing - get a contract amendment just like the government that states you're paying the lowest price they've ever sold the same thing for and make your rep sweat at end of quarter or end of year.

    How much is your time worth? Sure, you can get a bunch of drives and a couple of servers and glue it all together. Spend a week getting it working, a month tuning it, and the rest of the year wasting your time managing and patching it. That's probably £50k of your time wasted when the equivalent off the shelf will have only cost that. Scale that up to 3 years and you've just wasted £150k of your time, and whatever the cost was of the generic hardware. Budget for £100k for 100TB replicated with a few bells and whistles including 3 years top tier support, and you're suddenly £50k up over 3 years and have time to do your day job...

    At least that's how the cost justification works when the techie is talking with your IT director and the sales guy is buying the drinks at the 19th hole after a "networking" meeting on the golf course...

  14. RonWheeler

    Someone external to blame

    is important in a work environment. Otherwise the blame-buck stops with you. Homebrew playground - fill yer boots.

  15. Disk0
    Coat

    Don't speak to me...

    ....or my SAN ever again.

    I do apologize, mine's the one with the "Internet Humour For Dummies" book in the left pocket...

  16. Sixtysix
    Headmaster

    Live Fire Exercise

    Currently looking to procure a storage solution to replace what started out as a SAN-only consolidated storage strategy and now includes V-SAN and multiple NAS units to meet throughput needs. I briefly considered a "brew-your-own" (and my browser history covers the exmaples mentioned) but don't have the staff time to devote to such a critical project.

    As mentioned consistently in comments, I *need* support, Real, 24/7/365, drop-it-all and turn up type support for my front line SAN. I would however be quite happy for my supplier to provide white-box FreeNAS servers/shelves at a reasonable cost if they would then contract for the support (and could prove themselves capable of providing *sufficient* service). I'd even pay to carry whitebox spares given where our boxen are racked.

    More than happy: I'd be *delighted* to find a way to help meet my strategic FOSS commitment (it's harder than eople imagine). I really would not grudge a penny of the support costs, because my staff's time is already more than FULLY committed running the infrastructure, and THATS my problem - it needs to be plug and play, not an on-going build and test project!

    All (UK based) pointers welcome!

    1. Anonymous Coward
      Anonymous Coward

      Re: Live Fire Exercise

      Sorry for being an anon coward...

      I mucked around in this area a few times; it's impossible to support a FreeNAS etc unless you also manage it 100% and totally lock it down and implement change control, which then becomes burdensome to the user. A user or admin will always cock it up and claim that no one has touched it, then blame the support owner or kit when they did an rm -rf... The support then requires 25% of a full time head (expect to spend a week in four doing this!), which is going to be around £25k inc a couple of trips. Suddenly realistic support costs are up there with the big boys for "just a bloke in a shed googling the answer".. not very Enterprise.

      There are a couple of small shops doing just this who try to play in the SME space, but the support costs are too low for them to do a proper job so they pay staff poorly who move on quickly once they have a bit of experience.

      Good luck!

      1. Sixtysix
        Happy

        Re: Sorry for being anon...

        But if you didn't take the option, I probably would not have had feedback... THANK YOU.

        This is my real fear about using this kind of solution, and a risk I cannot authorise.

        And why, outside universities, FOSS is still too hard for SME.

        1. Justin Clift

          Re: Sorry for being anon...

          Maybe in the UK. There are well supported FOSS options in the US though. Hopefully after they've gone international enough, the UK gets supported properly too. :)

  17. Anonymous Coward
    Anonymous Coward

    Author is on the money

    I agree with some of the criticisms, but time for an anecdote...

    We had the option of paying $20K+ for a fault-tolerant iSCSI array with dual controllers, plus putting our own redundant filers on top, or $60K+ for a NetApp NFS server solution. Now I'll admit the latter would have been great.

    But since at our site everyone is an experienced sysadmin and/or developer, we simply bought commodity hardware, set up CentOS, DRBD, rgmanager, CLVM, XFS and Kerberized NFS, all over multiply redundant 10 Gb ethernet fabric, with proper failover, fencing etc. obviously, no single point of failure, and for a fraction of the TCO of a NetApp, we have a storage system more reliable than any cheap redundant controller array I've ever used (which often have their own cryptic bugs, and would still be more expensive than our system, including implementation time).

    We do routine tests and have experienced several actual hardware and software failures in the last 5 years, and the downtime is measured in seconds. 10 GbE is enough for us at present; if it wasn't we could move up to 40 GbE or simply widen the LACP links. The latency introduced by 10 GbE is nothing compared to the disk seek latency itself, so performance is fantastic. This on a production site serving 200 desktops plus numerous virtual machines and 'BYOD' clients.

    Moreover, we have the fantastically good feeling that *when* something does go wrong, we fully understand the system and can diagnose and correct the problem far more rapidly and efficiently than I have ever known a vendor to do. And trust me, I have *plenty* of experience with doing things that way...

    So it can be done and done well. You just have to know what you're doing and have the in-house expertise to do it right.

    1. Justin Clift

      Re: Author is on the money

      Good anecdote. :)

      As a thought, is there any chance your company would be ok with having that officially known? eg Some of the software projects you mention could make use of that as good Case Study material, for mutual promo benefit. Can point you to the right people if that's useful. :)

      1. Anonymous Coward
        Anonymous Coward

        Re: Author is on the money

        No objection in principle, would have to get management approval. We are actually somewhat behind the times as rgmanager has now been replaced by Pacemaker in RHEL 7, and DRBD is up to version 9 which is a complete rewrite. So I'm not sure how valuable our case study would be. Still, feel free to get in touch wth me privately if you wish (I assume there is some way of doing this on The Register!?)

        I also know others who have made a very successful business of selling cost-effective HA systems based on these technologies and are more up-to-date on it than I am.

      2. Mario Becroft
        Happy

        Re: Author is on the money

        Well I can't find any way to PM you on The Register, and this is not exactly a major trade secret so feel free to email me, Justin.

        My card with my email address on it can be found here: http://www.becroft.co.nz/

        Cheers.

  18. Anonymous Coward
    Anonymous Coward

    I built my own storage a while ago. It was cobbled together with a couple of Supermicro JBODs, some cheapy LSI controllers and a Fibre Channel HBA. Used SCST to present the disks over Fibre Channel and integrated it a StorNext file system. Was only meant to be a proof of concept but it ended up in production. Worked quite nicely. Might even still be alive....

  19. Anonymous Coward
    Anonymous Coward

    Buliding it yourself

    Disclaimer, I work as a storage guy, used too be a pure tech, now more like a architect.

    I've done business with all of them: NetApp, HP, EMC, IBM...

    I've managed more then five different plattforms.

    I've joked about building it myself for years, and done so for myself and friends.

    The monster under the bed is *when* shits hits the fan, and BigBoss™ wants too know exactly when all services once more is running like normal, with SLA payback is hiding behind the curtains. Then, you cannot hide behind the normal "Well we have a 4 hours fix-time deal with HP".

    Building a ZFS based storage system is quite simple, decent hw, LOTS of ECU RAM, simple hba's and jbods. Then you soon have a storage system able of doing decent NFS and possible FC luns. But hardware and service monitoring are lacking, billing systems are lacking and even if you have gotten parts from a alright provider there is no autosupport. Yes you can get HA with zfs-plugin from RSF-1. but it will cost you. And it's as all software, not flawless.

    Adding SMB to it will complicate things a lot, samba is IMHO crap. And all vendors who make good converged storage system have paid *alot* of money to Microsoft and written a proper SMB server.

    Even if you consider ZFS a good filesystem, witch it is, creating a storage environment is a lot of effort, and you will be responsible for services as well as outages. I've had battles with NetApp, over quality of releases, got hit quite hard a few years back with the toasters not freeing up dedup tabels, scrubbing gone spiraling down the drain, panics and so on. Still it's the best to manage. I've have had data loss with 3par, mirrors who went bad, tunesys gone haywire and more.

    Ah, did I forget that ZFS have no block pointer rewrite? That means no releveling vdevs, size it right from the beginning. And you get a *huge* overhead if you're thinking streched cluster (four way mirror).

    Do you really want that extra heat?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019