back to article Red Hat banishes Btrfs from RHEL

Red Hat has banished the Btrfs, the Oracle-created file system intended to help harden Linux's storage capabilities. The Deprecated Functionality List for Red Hat Enterprise Linux 7.4 explains the decision as follows: The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise …

  1. Chika

    Given SUSE's habitual brown nosing of RedHat, I suspect that it will probably bin btrfs eventually.

    1. thames

      I doubt that Suse will want to spend the resources required to keep Btrfs going. There's no market advantage to having it. I cant' see anyone one else stepping in to take over either. Suse will deprecate Btrfs at some time appropriate to their release cycle, and then gradually phase it out.

      Given the way that technology is going, I suspect that the future is going to involve file systems that were designed specifically for flash storage. Rotating disk will be used more like tape and have file systems to suit that role.

      1. Chika

        I'm in two minds about that. On one side I totally agree, but on the other hand it depends on whether it has enough people behind it to continue it as a fan supported project like KDE 3, for example. I wouldn't care if it did end up binned though...

      2. Doctor Syntax Silver badge

        "Given the way that technology is going, I suspect that the future is going to involve file systems that were designed specifically for flash storage."

        I think that, in response to malware, we might have to start looking at storage in a new way. Rather than letting any old application write to whatever lump of storage to which the user has access it will need to ask a service to do the writing and the service will ensure that the application has the appropriate credentials.

        1. boltar

          @doctor syntax

          "Rather than letting any old application write to whatever lump of storage to which the user has access it will need to ask a service to do the writing "

          I can't think of any malware or virus that got its way into a system via writing to the filesystem.

          Anyway it would be hideously slow due to paging data around the kernel and because of this will have to have exceptions for programs to write direct (eg for RDBMS's) so making it full of potential security issues which will be yet another thing for admins to worry about.

          1. Doctor Syntax Silver badge

            Re: @doctor syntax

            "I can't think of any malware or virus that got its way into a system via writing to the filesystem."

            And once it gets in it never does stuff like, let's say, overwrite all a user's files?

            1. boltar

              Re: @doctor syntax

              "And once it gets in it never does stuff like, let's say, overwrite all a user's files?"

              And your point is what exactly? Have you never heard of file permissions. Anyway, the filesystem is not a security weakness, its a basic facility. Also on unix/linux it would be quite easy to intercept any filesystem calls using LD_PRELOAD when test running a binary, having another layer on top of the filesystem solves nothing. Even MS realised this when they dumped WinFS.

        2. elip

          But you've just described exactly what a file system does.

          Whether you have a regular human user account with access to the data, or a service account/application token writing to the data store, the file system is responsible for the reads/writes/access enforcement. Why would we want even more abstraction?

        3. Tom Paine Silver badge

          Er. Isn't that exactly how it works today in POSIX land?

          *Confoosed

        4. Tom Samplonius

          "I think that, in response to malware, we might have to start looking at storage in a new way. Rather than letting any old application write to whatever lump of storage to which the user has access it will need to ask a service to do the writing and the service will ensure that the application has the appropriate credentials."

          Congratulations, you have just discovered SELinux. SELinux can enforce file access on a per application basis, plus network access, ports, etc.

      3. TVU Silver badge

        "I doubt that Suse will want to spend the resources required to keep Btrfs going. There's no market advantage to having it. I cant' see anyone one else stepping in to take over either. Suse will deprecate Btrfs at some time appropriate to their release cycle, and then gradually phase it out".

        However, they have in-house experts and I expect that they'll keep on working on Btrfs.

        As for Red Hat, I get the distinct impression that they are still not happy with the development and stability of Btrfs and so they are switching to the more mature XFS which has been around for years. They will be developing XFS and their recent acquisition of Permabit is entirely in line with a long term XFS development strategy.

        1. Anonymous Coward
          Anonymous Coward

          Given the sort of contributions RedHat has made to the ecosystem (Systemd anyone) I doubt that their contributions will be missed in this area. I don't believe they were key to it's survival and I'm not sure the community is overly keen on RedHat's direction and rampant self-interest.

      4. Alan Brown Silver badge

        "Given the way that technology is going, I suspect that the future is going to involve file systems that were designed specifically for flash storage."

        That's already where things are headed

        "Rotating disk will be used more like tape and have file systems to suit that role."

        Which is a pretty good description of the area that ZFS fits in (Spinning media fronted by gobs of flash as cache, with checksumming and proactive correction at every level to combat the inevitable errors that creep in every 45TB or so a disk reads. Compression, dedupe and encryption are all optional). Think of it as a faster Hierarchical filesystem

        Tape is still king in the archival format area though - and will be for a long time to come. Nothing beats it for cold storage.

      5. pwl

        @thames

        "There's no market advantage to having it. I cant' see anyone one else stepping in to take over either. "

        - every admin command on SLES including OS updates gets a snapshot before & after thanks to btrfs. any errors can be rolled back virtually instantaneously. downtime due to errors is significantly reduced. in the event of catastrophic failure, you can recover from a previous "good" on-disk version of the OS : no reinstall/rebuild needed. only SUSE offers this on enterprse linux

        - redhat's contribution to OSS is always important, but the main contributors to Btrfs were always Facebook, Fujitsu, Intel , Oracle, SanDisk, Netgear, & SUSE ... none of those companies have announced dropping work on the project...

      6. dajoker

        Btrfs Market Advantage

        The market advantage is definitely there if you are interested in enterprise products, which is why it is a little weird that RedHat would not want to be in on that, but their decision to not include it is about one drop in the overall bucket of contributions to the Btrfs filesystem. Go check lkml and see who is involved and maybe the reasoning for dropping Btrfs becomes more-clear as they lack expertise and may want to cut costs rather than employing people who know it well, instead focusing on other technologies that they know better despite lost functionality/

        In the meantime, I've saved more than a couple servers, and my own laptop, from needing to be reinstalled thanks to Btrfs. Whether it's a bad patch (even a bad kernel), a bad use of the great tools that mange the system (which automatically snapshot), or user error in some other way, reverting a snapshot is just awesome. Also I get a lot of pleasure from comparing before/after snapshots from non-RPM-packaged software to see what they really do to my system. Comparing in realtime, both before and after snapshots from the running system, is just wonderful. I've even seem clients using it for forensics, finding out what some miscreant did, or tried to do anyway, to cover up tracks, giving investigators exactly what they needed to prosecute all because the snapshot recorded changes made (creates, modifies, deletions) that were done to avoid incrimination; it's like somebody making a lot of all the things they want to do and handing it to the authorities.

        Disclaimer: I worked at SUSE several years ago, and am now a consultant on, among other things, SUSE and RHEL server products.

        1. AdamWill

          Optional

          Note: I work for Red Hat, but not on the relevant teams, so I absolutely don't have the authoritative answer on this, it's just my impression.

          "Go check lkml and see who is involved and maybe the reasoning for dropping Btrfs becomes more-clear as they lack expertise and may want to cut costs rather than employing people who know it well"

          My impression is it's kind of the other way around. Generally we hire people to work on stuff we're interested in shipping. There was a period when there was quite a lot of belief within RH that btrfs was The Future, and IIRC, at this time, we actually did employ multiple folks to work on btrfs full-time. (This was the period when it was a running joke that btrfs was going to be the default in Fedora in the *next* release...every release...it kept getting proposed as a feature, then delayed).

          Over the next several years, enthusiasm for btrfs just kinda generally declined internally, because it always seems to be perpetually not quite ready and running into inconvenient problems like eating people's data. (Again, this is not my main area of focus so this is only a vague impression I have, I'm not in a position to cite lots of data or state this authoritatively; I'm not going to get into a "who's right, SUSE or RH?" debate because I just don't have the expertise and I have nothing at all against SUSE), and in this time, most of the development resources we had for btrfs got reallocated elsewhere.

          So it's not so much "we don't want to ship btrfs because we don't have anyone to work on it" - I mean, it's not like we don't have the money to hire some more storage engineers if we *wanted* to - it's rather more "we're not hiring people to work on btrfs because we decided we don't think it's the right horse to bet on any more".

      7. Anonymous Coward
        Anonymous Coward

        Red Hat contributed nothing to BTRFS. This changes nothing.

    2. pwl
      Holmes

      probably shouldn't feed the trolls...

      ...but RedHat tends to follow where SUSE quietly leads without fanfare, such as providing xfs in the standard bundle (rather than 000's extra per cpu), using corosync/pacemaker for HA (after a 10+ year lag), showing an interest in Ceph (2 years after SUSE signed an enterprse deal with inktank), and properly supporting other platforms like IBM z.

      Btrfs isn't going away from SUSE - it's a shame RedHat won't contribute anymore, but since most of the involvement was from Fujitsu, SanDisk, Intel, Netgear, Facebook, and others as well as SUSE, it probably won't slow things down too much.

      Meanwhile, if you want full-OS snapshot/rollback on your Linux, it looks like SUSE will be your only choice for the forseeable future...

      1. hrudy

        Re: probably shouldn't feed the trolls...

        Having switched recently to OpenSuse and SLES, Btrfs has quite a learning curve. Some of their features like subovols almost seem like a solution in search of a problem. Perhaps the btrfs community and Suse has done a poor job in clearly explaining what the advantages (and Liabilities) of btrfs are.

        One enterprise user I talked to said that his metadata fills up on his root volume and locked up his system, I know that you can run janitor programs that balance the system. However, it seems that these systems are tuned to resolve esoteric problems like bitrot and system rollback after updates but create more issues for the Sys Admins then they solve. I know that Kernel 4.19 has improvements for btrfs. However, t unless you are running OpenSuse tumbleweed or unless Suse decides to bacport thes fixes for SLES 15 not going to help in the short term future.

    3. Anonymous Coward
      Anonymous Coward

      Brown-nosing... what a crock

      While it's both easy and popular to think the Linux world revolves around [your distribution of choice here] let's not somehow make vacuous claims about one distro vs. another without at least, you know, evidence. SUSE doesn't follow RHEL as is evident to anybody familiar with both products. Sure, they have a lot of common products, like Linux, and OpenStack-based things, etc., but the timelines based firmly in reality and documented online show a lot of non-following, at least by SUSE, when it comes to adopting technologies. The filesystem is definitely no exception; XFS was free and commonly -used by SLES long before RHEL finally made it available to everybody. BtrFS's primary contributor isn't RH, but SUSE, followed by several others, and then possibly RH gets in there with a few commits now and then.

      In other news, the [pick some government] announced it would no-longer support Bitcoin for transactions with the government..

  2. DougS Silver badge

    After so many version of Fedora that promised brtfs as the default filesystem

    Now they're binning it entirely? I find it hard to believe that lack of encryption was the only reason, especially since 1) every enterprise drive already supports encryption and 2) you can implement it using md.

    I'll bet this has something to do with politics and Oracle.

    1. Anonymous Coward
      Anonymous Coward

      Re: After so many version of Fedora that promised brtfs as the default filesystem

      Er, I think it was Google who aren't keen on it for its lack of encryption, not RedHat.

      Stallman is going to blow a fuse if ZFS gets widely adopted.

      This doesn't bode particularly well for the future of large scale GPL projects. Seems like more and more people are willing to mix code under different licenses even if they're incompatible. The allure of someone else's code can be too great to ignore. If the GPL is to remain influential, and not get routinely ignored, they need to get ZFS in Linux knocked on the head. If ZFS+Linux becomes the norm, unchallenged, it becomes harder to enforce GPL. Especially with court cases like Google vs Oracle which may yet pronounce on loosely similar copyright breaches being "fair use".

      1. Teiwaz Silver badge

        Re: After so many version of Fedora that promised brtfs as the default filesystem

        This doesn't bode particularly well for the future of large scale GPL projects.

        - Any license issue that gets in the way will eventually get sorted, maybe not GPL, but something.

        I can't actually fathom what was going on around Btrfs - lots of talk about it being the future default filesystem for 'Linux - yet it was left with Oracle?

        I'm just waiting to for the next great 'Linux schism when it's announced Lennart Poettering is doing a new filesystem for Red Hat.

        1. jdoe.700101
          Trollface

          Re: After so many version of Fedora that promised brtfs as the default filesystem

          A couple of simple mods to the systemd journal and we have a file system. Then simply add a couple more options to jouralctl and we have an ls replacement.

        2. Androgynous Cupboard Silver badge

          Re: After so many version of Fedora that promised brtfs as the default filesystem

          I'm just waiting to for the next great 'Linux schism when it's announced Lennart Poettering is doing a new filesystem for Red Hat.

          Don't even joke about that. Please.

          1. bobajob12

            Re: After so many version of Fedora that promised brtfs as the default filesystem

            Hans Reiser has some spare time on his hands.

            1. Anonymous Coward
              Anonymous Coward

              Re: After so many version of Fedora that promised brtfs as the default filesystem

              Hans Reiser has some spare time on his hands.

              There's what I was thinking. ReiserFS was a real killer filesystem....

              1. hititzombisi

                Re: After so many version of Fedora that promised brtfs as the default filesystem

                15 odd years ago, I was running stuff on ReiserFS and one of my largest data loss (wrt the existing media available) had happened then. At that time I knew Hans was up to no good...

        3. JSTY
          Joke

          Re: After so many version of Fedora that promised brtfs as the default filesystem

          > I'm just waiting to for the next great 'Linux schism when it's announced Lennart Poettering is doing a new filesystem for Red Hat.

          Don't worry, I'm sure it's on the SystemD roadmap

          (Not even sure if the joke icon is appropriate here ...)

        4. dmacleo

          Re: After so many version of Fedora that promised brtfs as the default filesystem

          systemdfilefail??

          filePulseFail?

      2. Bronek Kozicki Silver badge

        Re: After so many version of Fedora that promised brtfs as the default filesystem

        As you surely know, combination of ZFS+Linux is currently being challenged. It may yet turn out to be valid and legal combination and frankly, I see nothing wrong with such an outcome.

      3. Anonymous Coward
        Anonymous Coward

        @AC

        Stallman is going to blow a fuse if ZFS gets widely adopted.

        Being a huge fan of ZFS myself I'd pay to see that ;)

        But I'm not sure I fully understand though. I mean, all because of a license? Because ZFS uses a license (CDDL) which people don't happen to like or anything? What happened to the free software philosophy? Because CDDL is just as much an open source license as the GPL is. Sometimes I can't help worry that certain people completely lose focus of the eventual goals.

        Seems like more and more people are willing to mix code under different licenses even if they're incompatible.

        Can you imagine that... People apparently like the freedom to use the (open) source as they want to use it. One important note though: it's the GPL which is usually incompatible, not the other licenses. Many open source licenses (I'm mostly familiar with the BSD, CDDL and Apache licenses) have no problem at all with mixing things together, as long as the license continues to get respected.

        I think that's an important aspect here. GPL demands that everything gets redistributed under the GPL (all newly entered code) whereas the other licenses only demand that the original software simply continues to stay licensed under the same license it was given out with. Hardly an unfair demand to make I think, especially if you keep in mind that mixing that with another license is usually no problem.

        1. Jon 37

          Re: @AC

          The issue isn't that "ZFS uses a license (CDDL) which people don't happen to like", the issue is that it is widely believed that distributing a binary that includes both GPL and CDDL parts is not allowed by the licenses, and therefore copyright infringement.

          Only a court can eventually rule on this and give us a definite answer one way or another.

          As for "What happened to the free software philosophy?", certainly the GNU developers were (and are) very careful not to infringe on anyone's copyrights, and if someone offers code freely available subject to certain conditions they would either follow the conditions or not use the code.

          If you don't like that "GPL demands that everything gets redistributed under the GPL", then don't use GPL code. The *BSD distros and kernels are available as an alternative to Linux, and someone could port (maybe has ported?) CDDL-licensed ZFS to them. Alternatively, someone could look at the existing ZFS code and write a decent spec for ZFS, and someone else could implement it in new GPLed code for Linux.

          1. Bronek Kozicki Silver badge

            Re: @AC

            OpenZFS is natively available on FreeBSD. It is the same OpenZFS which is also available for Linux, under "ZFS on Linux" project, and which is included in Ubuntu since version 16.04. The parts of OpenZFS which are interacting with Linux kernel are doing so via modules called Solaris Porting Layer, which are all licensed under GPL (and not CDDL).

            1. oldcoder

              Re: @AC

              I believe you still aren't allowed to redistribute the code.

              Thus you aren't allowed to pass on your BSD code in a duplicated DVD.

            2. TVU Silver badge

              Re: @AC

              "OpenZFS is natively available on FreeBSD. It is the same OpenZFS which is also available for Linux, under "ZFS on Linux" project, and which is included in Ubuntu since version 16.04. The parts of OpenZFS which are interacting with Linux kernel are doing so via modules called Solaris Porting Layer, which are all licensed under GPL (and not CDDL)".

              ^ Absolutely this. This is why it is unlikely that there will be any challenge and why it will be even less likely that any such challenge will succeed.

        2. Anonymous Coward
          Anonymous Coward

          Re: @AC

          >> Because CDDL is just as much an open source license as the GPL is. Sometimes I can't help worry that certain people completely lose focus of the eventual goals.

          Those people are called bureaucrats and lawyers. Their goals never change, they exist only to make more work for themselves, and they love arguments like this. Normal people would just say "It's public domain, go have fun".

          All this fannying about over "my license is better than yours" reminds me of nothing more than the various "Palestinian" organisations in Life of Brian.

        3. oldcoder

          Re: @AC

          There is no restriction on the USE of the GPL code.

          There is a restriction that you can't steal the code and make it proprietary.

          The problem is that the CDDL disallows the code from being redistributed...

        4. This post has been deleted by its author

      4. Doctor Syntax Silver badge

        Re: After so many version of Fedora that promised brtfs as the default filesystem

        "This doesn't bode particularly well for the future of large scale GPL projects."

        Look at any desktop Linux userland and see how many licences are already in play.

      5. Anonymous Coward
        Anonymous Coward

        @AC - Re: After so many version of Fedora that promised brtfs as the default filesystem

        I gave you a down vote wholeheartedly for your failure to see the separation between developers wish and lawyers will. And also for equating 4 lines of code in Google vs Oracle with the whole ZFS code.

        You don't seem to understand much about licensing issues. Problem is not with ZFS adoption in Linux but with rights to distribute ZFS with Linux.

        According to your logic even Microsoft's hand could be forced because people really want to use their technologies for free.

      6. Alan Brown Silver badge

        Re: After so many version of Fedora that promised brtfs as the default filesystem

        "If the GPL is to remain influential, and not get routinely ignored, they need to get ZFS in Linux knocked on the head. "

        The incompatibilty between them is only if you _distribute_ the two together as binaries. There are a number of packages with this same issue and the response has been pragmatic for years - setup a script to pull in the required items from a separate source _after_ the OS has been installed.

        Apart from the feature work, there's been a _lot_ of work put into ensuring that ZFS doesn't touch GPL symbols in the kernel and the reality is that the FS is mature, reliable plus blazingly fast for spinning media (and even faster if you run it on all-flash systems)

    2. Oh Homer
      Linux

      Re: After so many version of Fedora that promised brtfs as the default filesystem

      Personally I'm breathing a huge sigh of relief, as the ironically named btrfs is an utter clusterfsck, in my experience.

      Sorry, but any filesystem that gets heavily fragmented by design is definitely not on my Christmas list, especially when this results in a performance degradation so crippling that I have to pull the plug, risking massive filesystem corruption in the process, then don't even have a fully functional (or even safe) fsck utility to recover afterwards.

      I'm shocked that Red Hat took this long to finally ditch it.

      ZFS has its own problems, of course, and I'm not even talking about the dreaded CDDL. For a start, it has a massive memory overhead. It also seems to be impossible to tell how much, if any, free space is available, due to the strange way that it views storage, and worse still there doesn't seem to be any way to reclaim "freed" space either.

      Frankly I'm not sure what problem these highly dysfunctional filesystems are trying to solve, but whatever it is they don't seem to have succeeded, so I'll just stick with ext4. Thanks anyway.

      1. Alan Brown Silver badge

        Re: After so many version of Fedora that promised brtfs as the default filesystem

        "ZFS has its own problems, of course, and I'm not even talking about the dreaded CDDL. For a start, it has a massive memory overhead"

        It's designed to be used on FILE servers, not your average desktop system. It also starts on the premise that "Disks are crap, cope with it and don't be a snowflake" - it doesn't demand you put in premium components and then have a tantrum when one goes down. It _expects_ things to break or give errors and heals automatically, on the fly when using commodity components (the only "critical" demand is ECC memory and you're not going to have that on your average desktop or laptop anyway). It plays to the strengths of hard drives and remediates their weaknesses using memory and flash drives to allow sustained simultaneous high IOP and high throughput activity.

        That memory is cache, with even more cache on SSD and it will maximise cache whenever possible, in line with its mission of being a fileserver's filesystem (You can tune the memory usage down but why? The more memory you throw at it, the better it becomes!)

        The advantage of doing it this way isn't obvious on your desktop or laptop (apart from the block checksumming, which is worthwhile by itself), but it shows up in spades when someone puts 200,000 files in one directory or you have thousands of users banging hard all over a 400TB directory tree containing 500 million files. This is why outfits like Lawrence Livermore laboratories have invested so much effort into it - and why I dropped £120k a couple of years back on a dedicated ZFS fileserver that has 256Gb of ram and 1TB of fast SSD cache

        Yes, you could run ext4 on this system - If you don't mind periodic downtime for housekeeping (fsck), and add the expense of an expensive fast hardware raid controller, or the complexity of MD-raid (which only allows 2 stripes instead of ZFS's 3 - and that's a big deal when you run 100TB+ installations as we've lost RAID6 arrays before) plus LVM, plus myriad individual filesystems to herd. You'd find that the overall performance with the same amount of memory would be substantially lower and if you want to try and match ZFS you'll have to do a lot of fiddling with dm-cache, or put your writes at risk of power failure/crash. After all that, you'll still have it fall over when a user puts 250,000 files in one directory.

        After spending years nursemaiding systems which suffered poor latency and got temperamental when users piled on the load it's a relief to have one which doesn't suddenly slow down 90%, pause for 4-5 minutes because a user did something stupid, (or become unstable), or be a major headache when a disk flipped a few bits (or died) - and for less money than comparable clustering "solutions" - which bitter experience has shown are not fit for purpose.

        ZFS is the right tool for the job it's designed for. Putting it on a low-load, low-memory desktop or laptop is on par with using a bucketwheel excavator when you only wanted to shift a wheelbarrow load. It will do it and it can be tuned to do it relatively well, but that's not what it's intended for.

        1. Oh Homer
          Headmaster

          Re: "It's designed to be used on FILE servers"

          Tell that to the masses who seem to think that ZFS is simply the better alternative to btrfs on the Linux desktop, for whom ZFS is the standard response to all criticism of btrfs, and who then set about defending the CDDL to allay fears that mass adoption might be stymied by licensing issues.

          No, I don't get it either, which is why I like to constantly remind everyone about how unsuitable ZFS is for the Linux desktop.

    3. John Sanders
      Linux

      Re: After so many version of Fedora that promised brtfs as the default filesystem

      >> I'll bet this has something to do with politics and Oracle.

      And lack of robust RAID5/6, or robust anything for that matter.

      I was once in love with BTRFS, but time and time again I found myself going back to mdadm + lvm, or just pure lvm.

      I wanted BTRFS to succeed, but how many years its been in beta?

      In the same time ext4 has gained all sort of niceties (like built-in ACLs or native encryption) and at the current rate we'll end getting native snapshots or Raid functionality way before BTRFS is ready.

      And yes I know Ted Tso (maintainer of ext4) said BTRFS is the future.

    4. AdamWill

      Re: After so many version of Fedora that promised brtfs as the default filesystem

      Few points:

      1. If you look at the proposed / accepted 'Features' / 'Changes' for the last several Fedora releases (we changed the name from 'Features' to 'Changes' a few cycles back...) you'll notice that whole circus of 'we promise it's going to be the default next release!...no, wait, we're delaying it again' stopped happening several releases back; it hasn't actually been proposed as a feature/change for several releases. Which some savvy folks interpreted as something of a sign about RH's declining keenness on btrfs, and...well, I guess now it's not revealing any secrets to say they weren't wrong. :P

      2. btrfs isn't being 'binned entirely' from *Fedora*; this announcement is specific to RHEL. Its status in Fedora for a long time has basically been "it's there, and it's approximately as supported as any other filesystem which is included and selectable from the installer but isn't the default", and that's still its status at present. Though I know the installer team has made noises about how their lives would be rather easier if they could kick it out of the installer again, I don't think that idea's live *right now*.

      (Also FWIW, which is very little as I'm certainly not plugged into to all the internal channels on this, I don't *think* there's anything particularly political about this, it's purely the case that our storage folks have been kinda gradually losing confidence in btrfs being really good enough for our customers for some time.)

  3. Anonymous Coward
    Anonymous Coward

    ZFS is the right choice for a server system

    ZFS contains a superset of btrfs features, plus a great number of nice bits for dealing with large and complex disk subsystems and NFS file serving. If your target is beefy enterprise servers (it's RHEL, after all!), deprecating btrfs in favour of ZFS seems to be an obvious choice.

    On a client system, I'd choose btrfs over ZFS any day - it still has the COW, the snapnoshots, and the cloning, which accounts for 99% of what I need on a client. It also has a much lighter resource footprint, and is harder to misconfigure.

    Sometimes, an apple is just an apple.

    1. Anonymous Coward
      Anonymous Coward

      Re: ZFS is the right choice for a server system

      Actually no.

      ZFS is NOT automatically "the right choice for a server system".

      For a start. The very founding principle of ZFS (that many people forget) is that it was designed as, and continues to be maintained as a JBOD DAS file system.

      If you are running an abstraction layer over your storage (such as a RAID controller, as many people do), then running ZFS on that is very much not recommended and WILL (not if), come back to kick you in the backside one day.

      1. LDS Silver badge

        maintained as a JBOD DAS file system

        ZFS does implement RAID features at the file system level instead of the hardware level. That's necessary to implement the resiliency features, which are more sophisticated than what RAID implements in its firmware.

        One advantage is you can move the disks from one controller to another, and mount them without issues. One disadvantage is expensive RAID controllers or enclosures may be useless, and the CPU/RAM requirements are high.

        Anyway, data are still distributed across disks. It's not a JBOD where data are on a single disk. You don't create a JBOD at the RAID level, you leave the disks as single ones, and ZFS will manage them.

        1. Dazed and Confused

          Re: maintained as a JBOD DAS file system

          > One disadvantage is expensive RAID controllers or enclosures may be useless, and the CPU/RAM requirements are high.

          CPUs and CPU licenses are far more expensive than a HW RAID controller and not only that they are slower too when it comes to things like the checksum calculations. These jobs are better off offloaded to a dedicated piece of HW IMHO.

          1. fnj

            Re: maintained as a JBOD DAS file system

            CPUs and CPU licenses are far more expensive than a HW RAID controller and not only that they are slower too when it comes to things like the checksum calculations. These jobs are better off offloaded to a dedicated piece of HW IMHO.

            Years ago, there used to be SOME validity to this. It's long gone now. Today's CPUs can burn through checksumming and parity calculations much faster than crappy RAID controllers can, and the load is inconsequential.

      2. Daniel B.
        Boffin

        Re: ZFS is the right choice for a server system

        For a start. The very founding principle of ZFS (that many people forget) is that it was designed as, and continues to be maintained as a JBOD DAS file system.

        This is actually a feature. You simply stick disks into your system, and set up zpools with RAIDZ1/2/3 instead. You'll get exactly the same functionality offered by RAID5/6, but without the dependency on the RAID controller. Ever had a RAID controller failure? Back in 2009, I found out that fakeraid controllers do weird stuff and thus their "RAID" arrays can't be read by other controllers, only the ones from the same brand/chipset you originally used.

        ZFS pools can be imported to any system and will always work.

        So yes, I'd rather have ZFS on raidz2 than a RAID controller that might leave me SOL if it breaks down and I can't get the same chipset when it does.

        1. Alan Brown Silver badge

          Re: ZFS is the right choice for a server system

          "Back in 2009, I found out that fakeraid controllers do weird stuff and thus their "RAID" arrays can't be read by other controllers, only the ones from the same brand/chipset you originally used."

          More recently I found the same problem with high end Adaptec controllers (£2k apiece at purchase in 2010). Much time and effort was spent trying to reassemble the raidset before giving up, restoring from backup tapes and dropping a HBA into the box. We found that on the originally installed hardware (E5600 based), MD-RAID6 was significantly faster than Adaptec's battery-backed-with-SSD-cache RAID6 controllers that got so hot you could fry an egg on them - and it did that without even running over 25% on one cpu core under full load.

      3. Alan Brown Silver badge

        Re: ZFS is the right choice for a server system

        "If you are running an abstraction layer over your storage (such as a RAID controller, as many people do),"

        Then you are not using ZFS as instructed - but that hasn't stopped a number of vendors selling expensive "ZFS servers" which have hardware RAID arrays in them and end up in various states of borkage under high load, especially coupled with the tendency to skimp on memory and cache drives.

        ZFS is _NOT_ a filesystem.

        ZFS is: An individual disk management system, a RAID manager with up to 3 stripe parity, a volume manager and a filesystem manager all rolled into one (and more).

        Bitter experience has shown that £2k RAID array controllers or £40k FC disk arrays all have severe limitations on their performance. Our old FC disk arrays handle 1/10 the IOPS of the same spindle count running on a ZFS system and that's down to both the inability of the onboard controllers to keep up, the pitiful amount of write cache they offer and their inability to prevent writes seeking all over the platters.

    2. Tomato42 Silver badge
      Boffin

      Re: ZFS is the right choice for a server system

      Red Hat is promoting LVM + XFS as a replacement for btrfs and ZFS

      1. Bronek Kozicki Silver badge

        Re: ZFS is the right choice for a server system

        ZFS provides strong checksums for protection against bitrot - which is one of the main reasons why people are using it. Neither LVM nor XFS provide such protection, hence I do not see such combination as viable replacement for ZFS. With the increasing data storage needs (but without corresponding increase in medium reliability) protection against bitrot is only going to get more important, so the whole direction seems a bit of a non-starter to me.

        1. jabuzz

          Re: ZFS is the right choice for a server system

          Yes it does it is called DIF/DIX and if you actually care about bit rot is better than anything that ZFS can ever provide. Mostly because ZFS will only tell you that there is a problem *AFTER* the event. That is if during the write something goes wrong and the data gets corrupted you will only get to find out when you try to read it back, by which time you won't be able to do anything about it, but you will at least know the data is bad.

          On the other hand DIF/DIX will stop the corrupted data from being written to the storage device (disk, flash, or whatever comes along) in the first place. It will also highlight any corruption to the data while it sits on the storage device. As such it is a *BETTER* solution than ZFS.

          Further ZFS is based around RAID5/6, which is frankly does not scale. Excuse me while I switch to dynamic disk pools.

          1. John Riddoch

            DIF/DIX

            Nope, it's not better than ZFS for data protection if you have mirroring or RAID. Here's why:

            While DIF/DIX will tell you at time of writing, it does sod-all after the fact, so if your data is corrupted due to any other reason, it will merely give an error (probably a SCSI read error, I'd assume). It won't even try to correct the fault.

            Looking at Redhat's note on it, there are limitations on it (direct IO on XFS only - see https://access.redhat.com/solutions/41548). ZFS doesn't have those restrictions. The Redhat doc mentions it as a "new feature in the SCSI standard", so old disks won't support it. ZFS doesn't care what disks you use as long as they appear as an appropriate block/character device.

            If you have ANY data corruption on ZFS, it'll detect it on read and if you have multiple data copies (mirrored, RAID-z or whatever), it'll fix it on the fly. If you only have a single copy, it'll error out and tell you which file(s) are unavailable, prompting you to recover those files.

            Oracle do recommend you run a zpool scrub periodically (once a week on standard disks, once a month on enterprise level storage) to capture errors - that will also automatically fix any errors on the checksums.

            ZFS does have a number of flaws (performance on a full zpool is pretty awful, for example), but it is very good at data integrity.

            1. jabuzz

              Re: DIF/DIX

              Duh, if you have mirrored disks and DIF/DIX you will get a recovery from the error too. So ZFS is emphatically not better.

            2. Alan Brown Silver badge

              Re: DIF/DIX

              "If you only have a single copy, it'll error out and tell you which file(s) are unavailable, prompting you to recover those files."

              Even if you only have a single drive, the metadata is replicated in several places by default and you can tell ZFS to store multiple copies of the data too. That's available on _top_ of the RAID functionality for times when you're feeling utterly paranoid.

          2. Bronek Kozicki Silver badge

            Re: ZFS is the right choice for a server system

            if during the write something goes wrong and the data gets corrupted That's one case I do not really care much about, because both software stack and controllers are pretty good at avoiding these kinds of errors (as long as hardware can be trusted - but then ECC memory is not really that expensive and no one really has to use overclocking)

            The kind of bitrot I care about, is storing my personal videos, pictures or ripped CDs or other data worth archiving, on magnetic medium, which then silently gets corrupted few years down the line. If stored on ZFS with data redundancy, then not only the error will be detected, but also the original data will be silently restored from redundant copies. With filesystems measured in terabytes (like, your usual archive of DSLR RAW pictures and a small library of ripped CDs which I own) this kind of bitrot is all but inevitable. Which is why I'm using ZFS with mirrored disks (and my offsite backups are also on ZFS, although my filer is Linux and backup is FreeBSD)

            1. jabuzz

              Re: ZFS is the right choice for a server system

              How do you know that the error didn't occur at write time though? You don't. So DIF/DIX will make sure the write was correct *AND* tell you down the line if it is corrupted. Sure ZFS is better than nothing, but if you really care then there are better solutions than ZFS. I guess you could get ZFS to do a verify on write, but performance is going to suffer in that scenario in a way it does not with DIF/DIX.

          3. Alan Brown Silver badge

            Re: ZFS is the right choice for a server system

            "That is if during the write something goes wrong and the data gets corrupted you will only get to find out when you try to read it back, by which time you won't be able to do anything about it, but you will at least know the data is bad."

            Which shows you haven't bothered to familiarise yourself with how ZFS works.

            "Further ZFS is based around RAID5/6, which is frankly does not scale. "

            Which shows the same thing.

            Are you trying to sell the competing software by any chance?

          4. PlinkerTind

            Re: ZFS is the right choice for a server system

            @jabuzz

            This is wrong. DIF/DIX disks does not protect against data corruption sufficiently. Have you ever looked at the specs for disks with DIX/DIF? All enterprise hard disk specs say something like "1 irrecoverable error on every 10^17 read bits", fibre channel, sas, etc - all equipped with DIX/DIF. The moral is that those disks also encounter corruption and when that occurs - they can not repair it. Also, these disks are susceptible to SILENT corruption - corruption that the disks never noticed. That is the worst corruption.

            ZFS detects all forms of corruption and repairs them automatically if you have redundancy (mirror, raid, etc). DIF/DIX disks can not do that. Even if you have a single disk with ZFS, you can provide redundancy by using "copies=2" which makes all data duplicated all over the disk, halving disk storage.

            .

            "...ZFS is based around RAID5/6, which is frankly does not scale..."

            This is pure wrong. A hardware raid card can only manage a few disks, so hw raid cards does not scale. OTOH, ZFS utilizes the disks directly, which means you can connect many SAS expanders and JBOD cards, so a single ZFS server can manage 1,000s of disks or more - you are limited by the number of ports on the server motherboard. ZFS scales well because it can use all the JBOD cards. A single hw raid card can not connect to all other hw raid cards - hw raid does not scale. ZFS scales.

            In fact, the IBM Sequioa supercomputer has a Lustre system that uses a ZFS pool with 55 Petabyte data and 1TB/sec bandwidth - can a hardware raid card handle a single Petabyte? Or sustain 1TB/sec? Fact is, a CPU is much faster than a hw raid. So a server with TB of ram and 100s of cores, will always outclass a hwraid card - how can you say that ZFS does not scale? It uses all the resources of the entire server.

            Regarding high RAM requirements, if you google a bit, there are several people running ZFS on raspberry pie with 256MB RAM. How can that be?

            Also, you can change OS and servers without problems with ZFS. Change disks to another server, change OS between Solaris, Linux, FreeBSD and MacOS. You are free to choose

            ZFS is the safest filesystem out there, scales best and is most open. Read the ZFS article on wikipedia for research papers where the scientists compare ZFS against other solutions, such as hw-raid and conclude that ZFS is the safest system out there. CERN has released several research papers saying the same thing - read the wikipedia article on ZFS.

  4. Adam 52 Silver badge

    "Michael Dexter feels that even with a vigorous ZFS community hard at work, we may be approaching the point at which open source file systems are reduced to a non-useful monoculture."

    For those that misread this awkward sentence in the same way I did. His concern was the monoculture and ZFS is the monoculture. The "vigorous ZFS community" has nothing to do with preventing a ZFS monoculture, so the "even with" is curious phrasing.

    1. Tinslave_the_Barelegged

      > For those that misread this awkward sentence in the same way I did.

      Ah, ta for that. I thought he was implying that for some unfathomable reason people would want a free operating system with a non-free (or disputably free) filesystem.

      ZFS is great. I am sure, but a general purpose FS it certainly is not. But Red Hat's decision is pretty inexplicable. What harm can there be in including BTRFS in their mix of supported file systems, especially at this stage in its development. It definitely smells like a political decision.

      1. TVU Silver badge

        "What harm can there be in including BTRFS in their mix of supported file systems, especially at this stage in its development. It definitely smells like a political decision".

        I get the impression that they still don't quite technically trust it being the less mature file system plus, and this is the speculation bit, they don't and can't control the development and direction of Btrfs hence the move to using and developing XFS in house.

      2. Gordan

        "What harm can there be in including BTRFS in their mix of supported file systems, especially at this stage in its development. It definitely smells like a political decision."

        There's no harm, but it is a huge amount of labour intensive work to backport patches back into the RH distro kernel. RH don't follow LT kernel releases, them take a snapshot of a kernel with a .0 point release, and after that, everything they merge is cherry picked. This is extremely labour intensive and error prone, and they couldn't care less about it if by a miracle a mispatch doesn't blow up spectacularly at build time due to the very specific kernel config they use. (Example: https://bugzilla.redhat.com/show_bug.cgi?id=773107 )

        In fairness, RH aren't to be singled out for not taking advantage of the, IME, more stable mainline LT kernel trees; most distros seem to engage in this pointless and laborious rejection of upstream kernels for "not invented here" reasons.

        1. Alan Brown Silver badge

          "RH don't follow LT kernel releases, them take a snapshot of a kernel with a .0 point release, and after that, everything they merge is cherry picked. This is extremely labour intensive and error prone,"

          They don't just do that with the kernel.

          EVERY part of RHEL is full of hand-merged backports without bothering to change the major version numbers. Just because something _SAYS_ it's foo version 2.5.5-35.el7_x86_64 doesn't mean that it's not got parts of (or all of) upstream foo version 4.5 merged into it.

          You make changes to a Redhat system at your peril. Beware, here be dragons. Nothing is what it seems.

          1. Bronek Kozicki Silver badge

            This is why I'm using Arch - my Linux kernels are following kernel releases exactly and are easy to build with my own configuration and fresh selection of version straight from www.kernel.org .

  5. iTheHuman

    So, had anyone informed the Facebook and Oracle devs that the fs they've been working on its finished?

    You know, it's not as though rh had ever been a vigorous backer of btrfs since most of their fs folks are on team xfs..not to mention clustering filesystems like Ceph and Gluster.

    Btw, some rh dev announced a new project that seems to be aiming for a more UNIX-y zfs (that is, without the layer violations, but with many of the same features). It actually looks kind of interesting with most of neat stuff happening in the fs daemon.

    1. iTheHuman

      Here's the link to that other project:

      https://github.com/stratis-storage/stratisd

      1. Bronek Kozicki Silver badge

        hmm .... this filesystem has a dependency on D-Bus. I will ignore it. I do not want to wake up in the world where 1) it gets integrated into systemd and 2) distributions agree that's the only filesystem their users will need.

        1. iTheHuman

          Mmmmmm, ok....

          Thumbs up, I guess?

          You certainly seem like a rational person who can make objective evaluations in technical matters. Your company's IT future is bright

          1. Bronek Kozicki Silver badge

            I have unpleasant experience with D-bus failing and I'd rather keep it away from the filesystems I use, because it was impossible to troubleshoot properly and even clean system shutdown was difficult (I have enabled SysRq on my system since then). Also, D-bus is a higher abstraction than a filesystem is, so making it a critical dependency in the management of a filesystem turns the dependencies in the system upside down, making it more difficult to recover when things go wrong. I think this is very rational evaluation.

            1. iTheHuman

              I'm not doubting that you've had problems involving dbus, but I can't say that those were problems caused by dbus (indication of a deeper issue). That you couldn't determine the actual issue provides additional support for my assertion.

              Dbus is just ipc, and stratis is going to make heavy use of IPC as the daemon which can hold additional state so that better global decisions can be made than would otherwise be possible (this is exactly how they plan to be able to take advantage of these well defined, existing, services while enjoying many of the features that monolithic fs like zfs/btrfs have without needing to poke holes through the vfs/block boundary).

              Something else to keep in mind, userspace is far more forgiving of errors than a kernel.

  6. Anonymous Coward
    Anonymous Coward

    AdvFS

    It's a shame that AdvFS died along with Tru64.

    http://advfs.sourceforge.net/

    Available under GPL v2 to be picked up and moved forward if anyone is interested...

    1. s2bu

      Re: AdvFS

      AdvFS was amazing. Sure, ZFS is better, but AdvFS brought a lot of features long before anybody else ever did.

  7. Anonymous Coward
    Anonymous Coward

    I <3 btrfs

    Been using btrfs for years. Being able to snapshot my root partition, do a distro upgrade, and selectively boot between them with a subvol=whatever in my grub config is awesome. Light years ahead of anything Windows can do.

    1. Daniel B.
      Boffin

      Re: I <3 btrfs

      Light years ahead of anything Windows can do.

      Everything is light years ahead of anything Windows, period.

      As for snapshots, that's available on ZFS too, mostly because btrfs was originally born as an Open Source equivalent to ZFS, mostly sponsored by Oracle. But then Oracle bought Sun and they got access to ZFS, so btrfs was "no longer important". :(

      I did try btrfs at some point, but it just didn't work well, so I had to move to ZFS. The latter is supported on pretty much every single OS except Windows (again, everyone's light years ahead of Redmond's OS) so it also serves as a multiplatform FS.

  8. Colin McKinnon

    "“lack of native file-based encryption unfortunately makes it a nonstarter"

    Yeah - because we really want embed (lots of incompatible, independently developed implementations of) encryption within the filesystem rather than using well managed code sitting on top or beneath the it.

  9. Phil Bennett

    People are still using btrfs?

    After the RAID5/6 issue which still isn't fixed a *year* later(!), people are still trusting their data to btrfs?

    For small storage, there are loads of options (and boot from ZFSoL is still new enough to be a concern, if not a blocker)

    For huge storage, you probably aren't using ZFS - you're looking at cluster filesystems (gluster, ceph, hdfs etc)

    For medium scale storage, ZFS is hard to beat. Work out a way to get ZFS on Linux compliant (even if that is to reverse engineer it) and move on.

    1. DougMac

      Re: People are still using btrfs?

      > After the RAID5/6 issue which still isn't fixed a *year* later(!), people are still trusting their data to btrfs?

      Umm, the RAID5 issue which isn't fixed correctly *since the beginning of the project*.

      The devs have known of conditions which will corrupt RAID5 since the start, and while there was a promising bug fix a while ago, they then found it only fixed one of the bugs, but others are known.

      The people doing btrfs have known about these issues for some time, and they never get properly fixed.

      Most likely, that is why RH is dropping support for it.

    2. Alan Brown Silver badge

      Re: People are still using btrfs?

      "For huge storage, you probably aren't using ZFS - you're looking at cluster filesystems (gluster, ceph, hdfs etc)"

      Guess what works best on the individual nodes undeneath the cluster?

      I'm running Gluster on top of ZFS here. It works well.

      1. iTheHuman

        Re: People are still using btrfs?

        Bluestore?

        It's probably not there yet, but it won't be much longer, so, no, zfs isn't needed (and is also not recommended by the ceph folks).

        1. Anonymous Coward
          Anonymous Coward

          Re: People are still using btrfs?

          Bluestore is going mainstream with the new version of SUSE Enterprise Storage 5 coming out next month (Ceph based for those unaware).

          I’ve seen some early pre-beta performance data (disclaimer: I work at Suse). Beats anything we were able to get out of gluster on the same kit hands down!

  10. boltar

    Anyone else just use ext4?

    Seems to have worked fine for us for years.

    1. Dazed and Confused

      Re: Anyone else just use ext4?

      Yepp, it's the default on RHEL/CentOS 6 and that doesn't have systemd so yes I still use a lot of ext4.

      1. Daniel B.
        Happy

        Re: Anyone else just use ext4?

        Ah, I thought I was the only one keeping to RHEL/CentOS 6 to avoid the systemd crap. I'm using a mix of ext4 and xfs on those systems. :)

    2. DougS Silver badge

      Re: Anyone else just use ext4?

      I use ext4 at home, and always thought I might someday switch to btrfs when it became the default in Fedora. Guess I'm going to continue to stick with ext4, I see no benefit in switching to ZFS or XFS.

      1. John H Woods

        Re: Anyone else just use ext4?

        Ext4 locally, ZFS on my fileserver.

        My fileserver snapshots my few TB or RaidZ3 every minute. If I've set it up right, there's no remote admin login, so you need physical access to delete snapshots.

        I cryptolockered the lot from a throwaway VM attached via NFS and it was possible to rapidly recover every single file from snapshots... I didn't even need to restore anything from backup.

        ZFS is marvellous... Let's just get the licence issue resolved...

  11. PNGuinn
    Trollface

    Btrfs

    Doesn't systemd support Btrfs properly then?

    We'll all have to go back to using EMACS.

  12. P.B. Lecavalier

    JFS!

    My attitude toward BTRFS has long been: "This is very promising. We can use it now? It doesn't seem like it's a simple drop-in replacement for ext4... I'll wait for others to try this Kool-Aid." Seems like it does not taste so good after all.

    In the mean time, I'm happy with JFS on my system. And will take the label odd-ball.

  13. iOS6 user

    the King is dead, long live the King

    So RedHat officially is stepping down from market area where Solaris is reigns.

    It is really funny to see final prove that all people who been telling in last decade that Solaris soon will be dead/legacy OS where wrong :)

  14. onapstanda
    Unhappy

    Sadly, if it had to stop here

  15. simpfeld

    RH looking at a different solution "Stratis"

    Asking about this elsewhere it looks like Red Hat are doing work on the "Stratis Storage Project" .

    This seems to be a bit of a management system that will allow you to emulate pretty much all features of an next generation file system using existing layers (LVM, MD , XFS). But adding things like block level checksumming to MD/LVM to allow the equivalent of individual file check sums. The argument seemed to be to, building this in a single layers like BTRFS and ZFS is too hard. The layering allows you to make the programming/debugging more manageable. I guess the key would be communication between layers, bad block checksum tells XFS the file is corrupt etc.

    Details here:

    https://fedoraproject.org/wiki/Changes/StratisStorage

    https://stratis-storage.github.io/StratisSoftwareDesign.pdf

    They also seem to have some interest in "BcacheFS". A "Bcachefs" developer says there are fundamental issues with the BTRFS design:

    "btrfs, which was supposed to be Linux's next generation COW filesystem - Linux's answer to zfs. Unfortunately, too much code was written too quickly without focusing on getting the core design correct first, and now it has too many design mistakes baked into the on disk format and an enormous, messy codebase "

    https://www.patreon.com/bcachefs

    Not sure the truth of this, I don't know enough about it.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019