back to article Open-source storage that doesn't suck? Our man tries to break TrueNAS

Data storage is difficult, and ZFS-based storage doubly so. There's a lot of money to be made if you can do storage right, so it's uncommon to see a storage company with an open-source model deliver storage that doesn't suck. I looked at TrueNAS from iXsystems, which, importantly, targets the SMB and midmarket with something …

  1. Mad Chaz

    CIFS vs SMB

    Not a mistake, but an update in terminologie.

    SMB isn't the term used in linux/bsd for the OLD samba protocol. It is now called CIFS as that version of it had MAJOR performance improvements over the old SMB system. SMB3 is used for the latest version (windows 10 and familly)

    1. sgp

      Re: CIFS vs SMB

      Small and Medium Businesses?

      1. MityDK

        Re: CIFS vs SMB

        SMB in this case refers to the application layer network protocol Microsoft uses for their OS to talk to each other, called Server Message Block.

        The first version was called CIFS, common internet file system, and was terrible. Since then, they have released version 2.x and 3.x and are now on 3.1.1 for windows 10.

        So when people say CIFS, they may be referring to SMB in general although technically CIFS was really SMB 1.0. Samba is the linux/unix based implementation of SMB.

        SMB didn't really start working halfway acceptably until 2.1, arguably, but there's really no way to not use SMB if you are using MS in the enterprise.

  2. Paul Crawford Silver badge

    Fail over?

    What are the reasons that will trigger a fail-over, and do the heads have some watchdog to force a reboot/fail-over in case one head gets sick?

    I ask this as someone who has suffered from the Sun Oracle ZFS appliance that would only fail over on a kernel panic of the other head. But the other head would invariably get stuffed in such a manner as to stop serving storage but not so screwed that it stopped the heartbeat links that arbitrated between them. We ended up using our nagios monitoring machine to check for usable NFS mounts and if that went bad for a while it would SSH in to the active head's ILOM to kick it in the NMI button.

    1. Bronek Kozicki

      Re: Fail over?

      As Trevor said, this is not a cluster. Clusters are not natively supported by ZFS, for this you need something like Gluster. Or proprietary closed solution from Oracle, which will all know is a trustworthy* partner.

      *trustworthy, as in "I trust you will take my arm at some point, and remaining limbs some time later"

      1. Paul Crawford Silver badge

        Re: Fail over?

        You don't need a cluster for fail-over, only if you want no outage at all.

        With two heads you can operate active-active or active-passive depending on the number of shares (1 share = active-passive only). If once goes down the other takes over that pool of data after a moderate time.

    2. brian 1

      Re: Fail over?

      There is a heartbeat. The failover is pretty quick and seamless. I have been running one for almost 2 years now, and have only failed over intentionally. When I did fail over, no one but me knew it even happened. I'm running around 30 VM's on mine.

      1. Ilsa Loving

        Re: Fail over?

        >There is a heartbeat. The failover is pretty quick and seamless. I have been running one for almost 2 years now, and have only failed over intentionally. When I did fail over, no one but me knew it even happened. I'm running around 30 VM's on mine.

        Would you mind sharing some details and rough costs of your setup? I'm currently investigating upgrade options, and your situation sounds similar to mine.

        1. Alan Brown Silver badge

          Re: Fail over?

          "Would you mind sharing some details and rough costs of your setup? "

          TrueNAS systems are a dual head active/passive failover setup (2 complete computers in the one box). This functionality isn't in the FreeNAS systems and having messed around with HA failover for years I'd say that trying to roll your own is not for the faint hearted.

          Failover takes under 30 seconds on the system here (400TB of rusty storage attached along with 3TB of pure flash array), which is short enough to not be noticeable even for linux systems using the fileserver as network /home - which is on the flash array. The rusty side here has just under 1TB of L2ARC, which mitigates head thrash when people start banging on the same files over and over for data processing.

    3. Alan Brown Silver badge

      Re: Fail over?

      "What are the reasons that will trigger a fail-over, and do the heads have some watchdog to force a reboot/fail-over in case one head gets sick?"

      Networking failure for the active head, watchdogs, etc.

      Both heads live in the same case and there's an internal networking setup as well as the externally facing stuff. It works extremely well in my experience.

  3. Nate Amsden

    no acceptable

    I realize their target market is not the enterprise but it seems you did very little failure testing.

    Per your mismatched node upgrade I'll share my experience on HP 3par.

    HP services involved the whole time. First run through validation tests, everything checks out. Next upgrade first controller. Comes up no issue. Upgrade 2nd controller. Does not come up. We wait for a while and then they determine a failure occurred(internal disk drive for controller failed during reboot).

    Ok so dispatch on site support with replacement drive. Takes a few hrs to get there. Array keeps running in degraded mode. Replace drive. Controller does not come up. Replace entire controller still does not come up. On site suppport having trouble with their USB to serial adapter crashing their laptop every 20 mins

    I get my friends at 3par involved at this point monitoring the situation to make sure the right resources ate engaged.

    HP tries the run book processes to get the controller online a half dozen times. Fails every time. Mismatched software is causing the active cluster node to reject the replaced controller. They try to image the controller a few more times still rejecting.

    I lose patience and tell HP to fix this now it had been over 16 hrs. Got level 4 engineering involved and situation was resolved within an hour after that. (No other reboots or outages required).

    Wasn't happy with how long it took but rather they take their time and get it right can't afford to lose the remaining controller.

    They fixed it though without too much impact to production(there was some impact given high write workload and lack of cache mirroring without a 2nd controller). Obviously new purchases are 4 controller units(with 3par persistent cache) something i had been pushing for years already

    Conversely i remember a blue arc upgrade many years ago that caused a 7 hr outage simply because the company lacked an escalation policy and on site support didn't have contacts to get help. My co worker who had the most blue arc experience at our company was able to kick people at bluearc hq and get help when their own suppprt could not. CEO later apologized to us and showed they now had implemented an escalation policy.

    So yeah hard to get shit right. Sometimes the worst failure scenarios are ones that you might never think of, which I learned a long time ago so do not compromise on quality storage.

    1. Bronek Kozicki

      Re: no acceptable

      OK, and how is that relevant to TrueNAS? Oh you mean, the importance of testing of failure modes? It seems in contrast to HP, iXsystems are doing more testing perhaps?

      1. Nate Amsden

        Re: no acceptable

        Taking an outage to reboot both controllers to solve mismatched software is not acceptable.

        So was sharing the contrast of situations with an array that is more purpose built to be HA and never require an outage for any reason.

        There are bugs though which can still cause a full outage on almost any system. The failure case the reviewer tested was quite controlled. And the recovery process was terrible.

        1. Nate Amsden

          Re: no acceptable

          And the bluearc bit was to show there is more to storage than having redundant controllers. If you can't get help because there is no escalation process that is just about as bad.

        2. Alan Brown Silver badge

          Re: no acceptable

          "Taking an outage to reboot both controllers to solve mismatched software is not acceptable."

          Nor is it necessary. I speak from experience here. The usual culprit for this kind of thing on a TrueNAS system is impatience as it can take quite a while for a freshly started node to be fully ready - which is why you have 2 nodes.

          However in a controlled environment you don't have a bearded dragon chaos monkey pulling power mid-upgrade (remember: both nodes in one case means that if the power to the (redundant) PSUs goes off, then the whole system stops)

          The system works off a set of static filesystem images in the boot media (usually satadoms). If an upgrade fails partway through (due to things like your chaos monkey pulling the power) then it will simply boot off the existing image (ie, last good snapshot). After the upgrade is complete and the new image is available on the boot media, a reboot will default into that image but you can fallback to any old software version you choose.

          The boot media is a ZFS raid1 array. Configuration itself is held in a sqlite database and backed up in a number of places including a hidden area on the ZFS fileserving arrays. It's remarkably robust and whilst TrueNAS is not _aimed_ at enterprise users, they handle enterprise loads far better than our previous "enterprise" fileserving setup ever did at less than 1/4 the cost of something like a EMC solution - and for about 1/10 the cost of the equivalent Oracle ZFS-based solution (I'm including 5 year support on this.)

          Just to give an idea about licensing costs: ixSystems sold us hardware and 5 year support on a 400TB system for less than Nexenta wanted to charge in first year software licensing charges alone (Nexenta don't sell systems, that's a seperate cost).

          The problem with most ZFS vendors has been the fact that they've wanted extortionate figures for their product, or they've committed absolute build boners which crippled the system - the most common errors being putting hardware raid in front of the disks or insufficient memory in the server (You need at least 1GB per TB or performance will suffer badly). IxSystems pricing is _extremely_ reasonable and the quality of their support(*) makes most "enterprise" outfits look quite shoddy.

          (*) If you need it. The odds are pretty good that unless you're doing cutting edge stuff, you won't need it.

  4. Jon Massey
    WTF?

    300GB 10k?

    Eeesh, what year is it?

    1. Bronek Kozicki

      Re: 300GB 10k?

      It's entry level price, I guess on this level the actual storage is just a tiny part of the total. Z20 scales up to more respectable 400TB.

      1. Wensleydale Cheese

        Confused

        "Z20 scales up to more respectable 400TB"

        The table there has this footnote: "* Compression rates vary by application. 2.5x compression factor for Hybrid Arrays and 10x for All-Flash Arrays is reflected in the effective capacity. "

        and indeed the All-Flash array at the RHS lists Capacity (RAW): up to 300 TB, and Effective* Capacity: up to 3 PB

        Can someone please explain what the 10x factor for All-Flash Arrays is all about?

        1. Bronek Kozicki

          Re: Confused

          Depending on the amount of RAM installed (at least 20GB of RAM per TB of data, plus more) users may be able to enable deduplication. I can imagine that in some test cases this might yield 10x , although how they fit 6TB of RAM in a single chassis I do not know. Also, LZ4 compression gives a very nice balance between CPU cost (very low - reduction in IO load alone is worth it) and compression rations.

        2. Alan Brown Silver badge

          Re: Confused

          "Can someone please explain what the 10x factor for All-Flash Arrays is all about?"

          Deduplication.

          DO NOT DO THIS unless you are absolutely sure what you are doing and know your workload intimately. ZFS memory requirements go from a relatively linear 1GB per TB to an exponentially increasing demand as the storage space increases.

  5. Duncan Macdonald
    Thumb Down

    Why hard drives and a storage server ?

    With only 3TB of disk storage - what is the point of using a storage server with hard drives. Using 4 local 1TB 850 Pro SSDs would provide far better performance (and even better performance if the application server supports PCIe storage). The best case of 150MB/sec is pitiful compared to even the cheapest consumer SSDs.

  6. Alistair
    Windows

    @TP

    Timing: impeccable

    Review: well done, well documented, with some of your usual class, humour and style.

    Would have been nice if it had been the larger unit --

    I may well be pointing purchasing at these folks for certain systems.

    @ DM - we aren't talking about single server data here. You don't buy a storage array for single server data, you use a single server. But, you don't get to think of that data in terms of 7/24/365 availability in *any* context if it is single server data.

    1. Trevor_Pott Gold badge

      Re: @TP

      If you need me to connect you to the vendor, you know where to find my e-mail. ;)

      1. Alan Brown Silver badge

        Re: @TP

        The vendor in the UK is currently Storm.

        I'm going to take credit for the fact that there's a UK vendor. We spent a lot of time working with ixSystems and local people (especially Frank Awuah) over the last 5 years to make this happen after having a torridly frustrating time dealing with UK vendors who were either pushing vastly overpriced or incredibly badly specced systems (frequently both).

        The fact that there were no authorised sellers in the EU made obtaining the things difficult in an environment where "lack of local support" was one of the issues raised within the organisation. Frank was one of the people who saw the screaming need in the market and worked extremely hard to ensure that ixSystems products were more easily obtainable on this side of the Atlantic. (The only way to get the things previously was to import them yourself, with all the attendant hassles with customs clearance, etc etc)

    2. Bronek Kozicki

      Re: @TP

      @TP it is not difficult to experiment a little with ZFS, but I recon the performance issues Trevor found were mostly related to the size of tested appliance. 50GB ARC, 200GB L2ARC, 8GB ZIL and 3TB actual storage is about the same range I use for home data store (using ZFS on Linux). You will want more L2ARC and ARC for little more serious load. And do not even think about enabling compression.

      1. Bronek Kozicki

        Re: @TP

        And do not even think about enabling compression.

        Oops, exactly wrong word used here. I meant, so not even think about enabling deduplication. On the other hand, you will definitely want to enable LZ4 compression across the whole pool, including for poorly-compressible data.

      2. Chavdar Ivanov

        Re: @TP

        I *think* you meant "deduplication" in the last sentence; LZ4 compression actually improves the ZFS throughput. Deduplication on the other hand is something imposed on the SUN's ZFS engineers by marketing and is usually discouraged except in rare and specific use cases.

    3. Duncan Macdonald
      Thumb Down

      Re: @TP

      However the performance of 150MB/sec is so low that it cannot support more than one moderate server.

      Either the disks have crappy performance or the server has horrible software. The network interfaces would allow for 2 GB/Sec per controller so the network efficiency is only 7.5 percent (or if both controllers can be used together then the network efficiency is only 3.75 percent).

      In fact the performance is so low that the 10GbE connections are unnecessary - a bonded pair of 1GbE links could handle the 150MB/sec throughput.

      As the system has only a single SSD for the L2ARC then it is only moderately high availability as the performance will degrade badly if that SSD fails - for a true high availability system there should be NO single point of failure

  7. Alistair
    Windows

    FS labelling vs network protocols

    SMB - server message block - communications protocol that backs typical CIFS (common internet file system) FS's.

    Samba: open source implementation of windows communication, authentication and presentation protocols and APIs.

    Windows AD. An authentication mechanism designed to confuse Samba users.

    <none of the above is intended as a dictionary definition or an implementation guide. These are purely my perspective. And I have been known to be wrong>

    1. Anonymous Coward
      Anonymous Coward

      Re: FS labelling vs network protocols

      Windows AD. An authentication mechanism designed to confuse Samba users.

      Hahaha, that alone deserves an upvote :)

  8. Anonymous Coward
    Anonymous Coward

    They also do systems for humbler budgets

    I sometimes wonder if I would have been better with a FreeNAS Mini (4 Bay) than the ReadyNAS 314 I ended up going for..

    1. CAPS LOCK

      Re: They also do systems for humbler budgets

      No one should ignore Nas4free.

    2. Down not across

      Re: They also do systems for humbler budgets

      I opted for nas4free for my HP Microserver.

      I recently upgraded from ancient FreeNAS 0.72 (original FreeNAS renamed to nas4free when iXsystems took over FreeNAS) and was pleasantly surprised that it happily imported the ancient softraid mirroed volume allowing me to copy it internally on the box to the newer larger ZFS pool.

      Whilst similar, they each have their own emphasis on things. FreeNAS has introduced quite extensive plug-in system which may appeal to some.

      The old 0.72 never missed a beat.

  9. John H Woods Silver badge

    Is ZFS hard - or just a bit different?

    I read up about it the other day and set up a little ubuntu 16 LTS server with a RaidZ2 4x2TB disk array to have a play with snapshots, deliberate disk destruction, etc. All seems rather straightforward, if a bit novel, and the evidence (frequently inadvertently posted here) would tend to point to me being no kind of genius.

    1. Bronek Kozicki

      Re: Is ZFS hard - or just a bit different?

      ZFS is not hard. But you might want to read on ZFS administration before embarking on serious use. Because "what's the worst that could happen" is, in case of a filesystem, pretty bad actually.

    2. Anonymous Coward
      Anonymous Coward

      Re: Is ZFS hard - or just a bit different?

      When it was new, ZFS had a lot of tunable settings that you really needed to tweak, especially if you weren't running it on a (then) large server. But most of the configuration has always been insanely easy, to the point where you usually think you've missed 90% of the steps. Especially for home users, you can just throw together a basic array with backup disks using a single command and forget about it until it informs you of a physical disk error. But, there are plenty of extra features you can play with, like encryption, compression, deduplication, snapshots, ZILs.

      I suspect that Trevor's pointing out that it is rather more complicated if you're interested in cutting edge storage with zero downtime and maximum throughput.

    3. Alan Brown Silver badge

      Re: Is ZFS hard - or just a bit different?

      "Different"

      The primary design ethos of ZFS is "Disks are crap. Deal with it" - instead of trying to deal with issues by layering lots of expensive redundancy all over the place and taking a big hit if a drive fails, it's designed around the _expectation_ that disks are cheap will fail and performance shouldn't suffer.

      Vendors tend to package enterprise drives with ZFS but the actual design is intended to take advantage of consumer-grade drives. My test system uses a mixture of WD REDs and Seagate NAS drives, but it originally had WD greens in it (they were crap and all failed as did every single ST2000-DM series drive but no data was ever lost)

      The advantages:

      1: It detects and fixes silent ECC fails on the fly (and pushes the fix back to disk) - this is important because the standard error rate for disks will result in 1 bad sector slipping past ECC checking about every 44TB read off a drive.

      2: There's no regular downtime needed for FSCK - scrubbing is done on the fly (a huge advantage when you have hundreds of TB of FSes)

      3: RAIDZ3 - yes, 3 parity disks. With large arrays comes the increased risk of multiple drive failures during a RAID rebuild. We've lost RAID6 arrays in the past and whilst you can restore from tape the downtime is still a pain in the arse. My test 32TB array (2TB drives) has been rebuilt on the fly a large number of times and some of the drives are over 7 years old. It's still to lose any data,

      4: Simpler hardware requirements. You don't need expensive RAID hardware (in fact it's a nuisance). ZFS talks directly to the drives and expects to be able to do so - this is where a number of "ZFS" vendors have badly cocked up what they're offering, crippling performance and reliability.

      The money saved on HW raid controllers should be put into memory. You'd be surprised now many vendors are flogging multi-hundred TB systems with only 8 or 16GB ram. The more memory you can feed into a ZFS system the better it will perform - you need 1GB/TB up to about 50TB and that can be relaxed to about 0.5GB/TB above that, but more is better. Dedupe will massively increase both CPU and memory requirements and for most uses is not worth enabling (it's great in things like document stores or mail spools/IMAP stores)

      5: Read caching (ARC and L2ARC) - this is relatively intelligent. Metadata (directories) are preferentially cached. Large files and sequentially read ones are usually not - preference is given to small files read often, to minimize headseek.

      L2ARC allocations use ARC memory (pointers). If there is not enough ARC allocated, then L2ARC will not be used or may not be fully used.

      6: Write caching. (ZIL and SLOG) - all pending writes are written to the ZIL first, then to the main array, then erased from the ZIL. The ZIL is part of the ZFS array unless you have a dedicated SLOG disk.

      Important: SLOG and ZIL are NOT write caches. They're only there for crash recovery and are write-only under normal operation. ZFS keeps pending writes in memory and flushes them from there. The advantage of having the SLOG is that writes can be deferred until the disk is "quiet" and streamed out sequentially.

      Caveats:

      ZFS is designed to be autotuning, but some assumptions are wrong for dedicated systems (vs general purpose servers)

      7: Tuning is important - Autotuning the ARC will result in only half the memory being allocated. In a dedicated ZFS server like TrueNAS, you can set this up around 80-90% (only about 3-4Gb is actually needed for the operating system) - and on top of that there's a tunable for how much metadata is cached (usually only about 20% of ARC). This can be wound up high on systems with lots of little files.

      There are other tweaks you can make. The IOPS reported are pretty poor compared to the rusty arrays here.

      The thing to bear in mind about ZFS is that it is entirely focussed around data integrity. Performance comes second. It will never be "blisteringly fast", but you can guarantee that the bits you put in are the same bits you get out.

      That said, the ZFS arrays here are at least 10 times faster than equivalent hardware using EXT4/GFS/XFS filesystems when serving NFS and the caching (effectively a massive hybrid disk) means that head thrash on repeated hits to the same files is nonexistent. If you have a network /home this is an important feature.

      I'm getting upwards of 400MB/s out of the TrueNAS when backing up, whilst still running fileserving operations and with access latencies staying low enough that people don't notice. When headthrash starts happening on any disk system, latencies go sky high extremely quickly so this is a critical parameter to keep an eye on.

      Watch those graphs. If your ARC or L2ARC aren't growing to maximums then you're doing something wrong.

  10. Anonymous Coward
    Anonymous Coward

    For Home/SMB try OpenMediaVault

    Debian based and has a decent web browser GUI.

    Can even be installed on Rasberry Pi for USB based storage.

  11. dan1980

    You keep using that word . . .

    "I get to talk about it at parties and have product managers freak out until they realise I broke their baby on purpose, and not in the course of normal operations."

    As a Canadian, I know you use the language as I do, but still I am perplexed by your use of the word 'parties' here.

    Admittedly, most parties I end up at these days involve exhausting myself pushing ungrateful children on swings and being hit in the nethers with toy cars. But at least there's usually cake.

    1. Trevor_Pott Gold badge

      Re: You keep using that word . . .

      Trevor go to conference. Vendor throw conference party and guilt Trevor into going. Trevor try to escape. Many drinks, many shake hands. Music too loud. Talk about breaking tech. Trevor finally allowed to escape.

      Parties no have coffee.

      *thud*

      ^ Pretty much like that.

      1. dan1980

        Re: You keep using that word . . .

        Ahh, right.

  12. michaeldexter

    Open-source storage that doesn't suck...

    @TP

    Great review. Thank you!

    CIFS vs SMB: This terminology is corrected in FreeNAS and will be in a future version of TrueNAS. We all agree this was slightly incorrect for far too long.

    The dated interface: Guilty as charged and I encourage you to take a look at the FreeNAS 10 BETA. We are rewriting FreeNAS and TrueNAS from the ground up, just as we did with FreeNAS 8 in 2010.

    All the best,

    Michael Dexter

    iXsystems Senior Analyst

    1. Paul Crawford Silver badge

      Re: The dated interface

      Please, please don't make it into another sucky "modern" style! OK?

      Keep it functional and discoverable for users who rarely touch the box.

    2. Justin Clift

      Re: Open-source storage that doesn't suck...

      @michaeldexter - The new UI looks pretty, and the new upcoming approach for integrating Docker containers (so people can run things on the appliance) looks nifty.

      One important thing seems to be missing from the new UI though: being able to see things in a list.

      For example, with the current (9.10) interface, when a user clicks on (say) the Storage heading they're shown a list of all the storage in the system, with each row having all of the important fields included.

      With the in-development (10) interface, it seems doesn't seem to support showing information like that? A person needs to click on each storage element individually to see the important elements for that one.

      Hoping there's a plan to address that, or I'm not seeing something obvious? :)

  13. Dinsdale247

    Hey Trevor, have you seen the NEW ui?

    https://www.youtube.com/watch?v=8TS6vvpP1yQ

    A couple of comments:

    -While the UI is stodgy, the handbook is very complete and accurate and will be able to walk most people who have some understanding of ISCSI through setup. The documentation is excellent (like FreeBSD). I was a strictly windows guy when I started using it. Now I'm a bit of a FreeBSD wonk. :-/

    - To put the "mediocre performance into perspective, I run an entire windows development domain with 40 VMs and 5-10 developers on a 3 year old FreeNas 8.3 server (a TrueNas unit without the warranty). I would think there would be more than enough horsepower for significant workloads. Do you have any numbers on what you consider an average user load (I know, I know "it's hard to estimate blah blah blah")?

    1. Justin Clift

      As a data point, there's a newer video out since that one, showing some of the things added since the one you're pointing at:

      * https://www.youtube.com/watch?v=FzyMAGbp6_g

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon