back to article Red Hat banishes Btrfs from RHEL

Red Hat has banished the Btrfs, the Oracle-created file system intended to help harden Linux's storage capabilities. The Deprecated Functionality List for Red Hat Enterprise Linux 7.4 explains the decision as follows: The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise …

Re: ZFS is the right choice for a server system

Yes it does it is called DIF/DIX and if you actually care about bit rot is better than anything that ZFS can ever provide. Mostly because ZFS will only tell you that there is a problem *AFTER* the event. That is if during the write something goes wrong and the data gets corrupted you will only get to find out when you try to read it back, by which time you won't be able to do anything about it, but you will at least know the data is bad.

On the other hand DIF/DIX will stop the corrupted data from being written to the storage device (disk, flash, or whatever comes along) in the first place. It will also highlight any corruption to the data while it sits on the storage device. As such it is a *BETTER* solution than ZFS.

Further ZFS is based around RAID5/6, which is frankly does not scale. Excuse me while I switch to dynamic disk pools.

0
15

DIF/DIX

Nope, it's not better than ZFS for data protection if you have mirroring or RAID. Here's why:

While DIF/DIX will tell you at time of writing, it does sod-all after the fact, so if your data is corrupted due to any other reason, it will merely give an error (probably a SCSI read error, I'd assume). It won't even try to correct the fault.

Looking at Redhat's note on it, there are limitations on it (direct IO on XFS only - see https://access.redhat.com/solutions/41548). ZFS doesn't have those restrictions. The Redhat doc mentions it as a "new feature in the SCSI standard", so old disks won't support it. ZFS doesn't care what disks you use as long as they appear as an appropriate block/character device.

If you have ANY data corruption on ZFS, it'll detect it on read and if you have multiple data copies (mirrored, RAID-z or whatever), it'll fix it on the fly. If you only have a single copy, it'll error out and tell you which file(s) are unavailable, prompting you to recover those files.

Oracle do recommend you run a zpool scrub periodically (once a week on standard disks, once a month on enterprise level storage) to capture errors - that will also automatically fix any errors on the checksums.

ZFS does have a number of flaws (performance on a full zpool is pretty awful, for example), but it is very good at data integrity.

13
0
Silver badge

Re: ZFS is the right choice for a server system

if during the write something goes wrong and the data gets corrupted That's one case I do not really care much about, because both software stack and controllers are pretty good at avoiding these kinds of errors (as long as hardware can be trusted - but then ECC memory is not really that expensive and no one really has to use overclocking)

The kind of bitrot I care about, is storing my personal videos, pictures or ripped CDs or other data worth archiving, on magnetic medium, which then silently gets corrupted few years down the line. If stored on ZFS with data redundancy, then not only the error will be detected, but also the original data will be silently restored from redundant copies. With filesystems measured in terabytes (like, your usual archive of DSLR RAW pictures and a small library of ripped CDs which I own) this kind of bitrot is all but inevitable. Which is why I'm using ZFS with mirrored disks (and my offsite backups are also on ZFS, although my filer is Linux and backup is FreeBSD)

6
0

Re: DIF/DIX

Duh, if you have mirrored disks and DIF/DIX you will get a recovery from the error too. So ZFS is emphatically not better.

0
5

Re: ZFS is the right choice for a server system

How do you know that the error didn't occur at write time though? You don't. So DIF/DIX will make sure the write was correct *AND* tell you down the line if it is corrupted. Sure ZFS is better than nothing, but if you really care then there are better solutions than ZFS. I guess you could get ZFS to do a verify on write, but performance is going to suffer in that scenario in a way it does not with DIF/DIX.

0
5
Silver badge

Re: maintained as a JBOD DAS file system

> One disadvantage is expensive RAID controllers or enclosures may be useless, and the CPU/RAM requirements are high.

CPUs and CPU licenses are far more expensive than a HW RAID controller and not only that they are slower too when it comes to things like the checksum calculations. These jobs are better off offloaded to a dedicated piece of HW IMHO.

1
5
Boffin

Re: ZFS is the right choice for a server system

For a start. The very founding principle of ZFS (that many people forget) is that it was designed as, and continues to be maintained as a JBOD DAS file system.

This is actually a feature. You simply stick disks into your system, and set up zpools with RAIDZ1/2/3 instead. You'll get exactly the same functionality offered by RAID5/6, but without the dependency on the RAID controller. Ever had a RAID controller failure? Back in 2009, I found out that fakeraid controllers do weird stuff and thus their "RAID" arrays can't be read by other controllers, only the ones from the same brand/chipset you originally used.

ZFS pools can be imported to any system and will always work.

So yes, I'd rather have ZFS on raidz2 than a RAID controller that might leave me SOL if it breaks down and I can't get the same chipset when it does.

9
0
Silver badge

Re: ZFS is the right choice for a server system

"If you are running an abstraction layer over your storage (such as a RAID controller, as many people do),"

Then you are not using ZFS as instructed - but that hasn't stopped a number of vendors selling expensive "ZFS servers" which have hardware RAID arrays in them and end up in various states of borkage under high load, especially coupled with the tendency to skimp on memory and cache drives.

ZFS is _NOT_ a filesystem.

ZFS is: An individual disk management system, a RAID manager with up to 3 stripe parity, a volume manager and a filesystem manager all rolled into one (and more).

Bitter experience has shown that £2k RAID array controllers or £40k FC disk arrays all have severe limitations on their performance. Our old FC disk arrays handle 1/10 the IOPS of the same spindle count running on a ZFS system and that's down to both the inability of the onboard controllers to keep up, the pitiful amount of write cache they offer and their inability to prevent writes seeking all over the platters.

5
0
Silver badge

Re: ZFS is the right choice for a server system

"That is if during the write something goes wrong and the data gets corrupted you will only get to find out when you try to read it back, by which time you won't be able to do anything about it, but you will at least know the data is bad."

Which shows you haven't bothered to familiarise yourself with how ZFS works.

"Further ZFS is based around RAID5/6, which is frankly does not scale. "

Which shows the same thing.

Are you trying to sell the competing software by any chance?

5
0
Silver badge

Re: DIF/DIX

"If you only have a single copy, it'll error out and tell you which file(s) are unavailable, prompting you to recover those files."

Even if you only have a single drive, the metadata is replicated in several places by default and you can tell ZFS to store multiple copies of the data too. That's available on _top_ of the RAID functionality for times when you're feeling utterly paranoid.

2
0
Silver badge

Re: ZFS is the right choice for a server system

"Back in 2009, I found out that fakeraid controllers do weird stuff and thus their "RAID" arrays can't be read by other controllers, only the ones from the same brand/chipset you originally used."

More recently I found the same problem with high end Adaptec controllers (£2k apiece at purchase in 2010). Much time and effort was spent trying to reassemble the raidset before giving up, restoring from backup tapes and dropping a HBA into the box. We found that on the originally installed hardware (E5600 based), MD-RAID6 was significantly faster than Adaptec's battery-backed-with-SSD-cache RAID6 controllers that got so hot you could fry an egg on them - and it did that without even running over 25% on one cpu core under full load.

5
0
fnj

Re: maintained as a JBOD DAS file system

CPUs and CPU licenses are far more expensive than a HW RAID controller and not only that they are slower too when it comes to things like the checksum calculations. These jobs are better off offloaded to a dedicated piece of HW IMHO.

Years ago, there used to be SOME validity to this. It's long gone now. Today's CPUs can burn through checksumming and parity calculations much faster than crappy RAID controllers can, and the load is inconsequential.

5
0

Re: ZFS is the right choice for a server system

@jabuzz

This is wrong. DIF/DIX disks does not protect against data corruption sufficiently. Have you ever looked at the specs for disks with DIX/DIF? All enterprise hard disk specs say something like "1 irrecoverable error on every 10^17 read bits", fibre channel, sas, etc - all equipped with DIX/DIF. The moral is that those disks also encounter corruption and when that occurs - they can not repair it. Also, these disks are susceptible to SILENT corruption - corruption that the disks never noticed. That is the worst corruption.

ZFS detects all forms of corruption and repairs them automatically if you have redundancy (mirror, raid, etc). DIF/DIX disks can not do that. Even if you have a single disk with ZFS, you can provide redundancy by using "copies=2" which makes all data duplicated all over the disk, halving disk storage.

.

"...ZFS is based around RAID5/6, which is frankly does not scale..."

This is pure wrong. A hardware raid card can only manage a few disks, so hw raid cards does not scale. OTOH, ZFS utilizes the disks directly, which means you can connect many SAS expanders and JBOD cards, so a single ZFS server can manage 1,000s of disks or more - you are limited by the number of ports on the server motherboard. ZFS scales well because it can use all the JBOD cards. A single hw raid card can not connect to all other hw raid cards - hw raid does not scale. ZFS scales.

In fact, the IBM Sequioa supercomputer has a Lustre system that uses a ZFS pool with 55 Petabyte data and 1TB/sec bandwidth - can a hardware raid card handle a single Petabyte? Or sustain 1TB/sec? Fact is, a CPU is much faster than a hw raid. So a server with TB of ram and 100s of cores, will always outclass a hwraid card - how can you say that ZFS does not scale? It uses all the resources of the entire server.

Regarding high RAM requirements, if you google a bit, there are several people running ZFS on raspberry pie with 256MB RAM. How can that be?

Also, you can change OS and servers without problems with ZFS. Change disks to another server, change OS between Solaris, Linux, FreeBSD and MacOS. You are free to choose

ZFS is the safest filesystem out there, scales best and is most open. Read the ZFS article on wikipedia for research papers where the scientists compare ZFS against other solutions, such as hw-raid and conclude that ZFS is the safest system out there. CERN has released several research papers saying the same thing - read the wikipedia article on ZFS.

3
0
Silver badge

"Michael Dexter feels that even with a vigorous ZFS community hard at work, we may be approaching the point at which open source file systems are reduced to a non-useful monoculture."

For those that misread this awkward sentence in the same way I did. His concern was the monoculture and ZFS is the monoculture. The "vigorous ZFS community" has nothing to do with preventing a ZFS monoculture, so the "even with" is curious phrasing.

4
0

> For those that misread this awkward sentence in the same way I did.

Ah, ta for that. I thought he was implying that for some unfathomable reason people would want a free operating system with a non-free (or disputably free) filesystem.

ZFS is great. I am sure, but a general purpose FS it certainly is not. But Red Hat's decision is pretty inexplicable. What harm can there be in including BTRFS in their mix of supported file systems, especially at this stage in its development. It definitely smells like a political decision.

2
1
TVU
Bronze badge

"What harm can there be in including BTRFS in their mix of supported file systems, especially at this stage in its development. It definitely smells like a political decision".

I get the impression that they still don't quite technically trust it being the less mature file system plus, and this is the speculation bit, they don't and can't control the development and direction of Btrfs hence the move to using and developing XFS in house.

2
0

"What harm can there be in including BTRFS in their mix of supported file systems, especially at this stage in its development. It definitely smells like a political decision."

There's no harm, but it is a huge amount of labour intensive work to backport patches back into the RH distro kernel. RH don't follow LT kernel releases, them take a snapshot of a kernel with a .0 point release, and after that, everything they merge is cherry picked. This is extremely labour intensive and error prone, and they couldn't care less about it if by a miracle a mispatch doesn't blow up spectacularly at build time due to the very specific kernel config they use. (Example: https://bugzilla.redhat.com/show_bug.cgi?id=773107 )

In fairness, RH aren't to be singled out for not taking advantage of the, IME, more stable mainline LT kernel trees; most distros seem to engage in this pointless and laborious rejection of upstream kernels for "not invented here" reasons.

3
0
Silver badge

"RH don't follow LT kernel releases, them take a snapshot of a kernel with a .0 point release, and after that, everything they merge is cherry picked. This is extremely labour intensive and error prone,"

They don't just do that with the kernel.

EVERY part of RHEL is full of hand-merged backports without bothering to change the major version numbers. Just because something _SAYS_ it's foo version 2.5.5-35.el7_x86_64 doesn't mean that it's not got parts of (or all of) upstream foo version 4.5 merged into it.

You make changes to a Redhat system at your peril. Beware, here be dragons. Nothing is what it seems.

3
0
Silver badge

This is why I'm using Arch - my Linux kernels are following kernel releases exactly and are easy to build with my own configuration and fresh selection of version straight from www.kernel.org .

0
0

So, had anyone informed the Facebook and Oracle devs that the fs they've been working on its finished?

You know, it's not as though rh had ever been a vigorous backer of btrfs since most of their fs folks are on team xfs..not to mention clustering filesystems like Ceph and Gluster.

Btw, some rh dev announced a new project that seems to be aiming for a more UNIX-y zfs (that is, without the layer violations, but with many of the same features). It actually looks kind of interesting with most of neat stuff happening in the fs daemon.

6
0

Here's the link to that other project:

https://github.com/stratis-storage/stratisd

2
0
Silver badge

hmm .... this filesystem has a dependency on D-Bus. I will ignore it. I do not want to wake up in the world where 1) it gets integrated into systemd and 2) distributions agree that's the only filesystem their users will need.

13
1

Mmmmmm, ok....

Thumbs up, I guess?

You certainly seem like a rational person who can make objective evaluations in technical matters. Your company's IT future is bright

1
0
Silver badge

I have unpleasant experience with D-bus failing and I'd rather keep it away from the filesystems I use, because it was impossible to troubleshoot properly and even clean system shutdown was difficult (I have enabled SysRq on my system since then). Also, D-bus is a higher abstraction than a filesystem is, so making it a critical dependency in the management of a filesystem turns the dependencies in the system upside down, making it more difficult to recover when things go wrong. I think this is very rational evaluation.

1
0

I'm not doubting that you've had problems involving dbus, but I can't say that those were problems caused by dbus (indication of a deeper issue). That you couldn't determine the actual issue provides additional support for my assertion.

Dbus is just ipc, and stratis is going to make heavy use of IPC as the daemon which can hold additional state so that better global decisions can be made than would otherwise be possible (this is exactly how they plan to be able to take advantage of these well defined, existing, services while enjoying many of the features that monolithic fs like zfs/btrfs have without needing to poke holes through the vfs/block boundary).

Something else to keep in mind, userspace is far more forgiving of errors than a kernel.

0
0
Anonymous Coward

AdvFS

It's a shame that AdvFS died along with Tru64.

http://advfs.sourceforge.net/

Available under GPL v2 to be picked up and moved forward if anyone is interested...

6
0

Re: AdvFS

AdvFS was amazing. Sure, ZFS is better, but AdvFS brought a lot of features long before anybody else ever did.

1
0
Anonymous Coward

I <3 btrfs

Been using btrfs for years. Being able to snapshot my root partition, do a distro upgrade, and selectively boot between them with a subvol=whatever in my grub config is awesome. Light years ahead of anything Windows can do.

7
0
Boffin

Re: I <3 btrfs

Light years ahead of anything Windows can do.

Everything is light years ahead of anything Windows, period.

As for snapshots, that's available on ZFS too, mostly because btrfs was originally born as an Open Source equivalent to ZFS, mostly sponsored by Oracle. But then Oracle bought Sun and they got access to ZFS, so btrfs was "no longer important". :(

I did try btrfs at some point, but it just didn't work well, so I had to move to ZFS. The latter is supported on pretty much every single OS except Windows (again, everyone's light years ahead of Redmond's OS) so it also serves as a multiplatform FS.

4
0

"“lack of native file-based encryption unfortunately makes it a nonstarter"

Yeah - because we really want embed (lots of incompatible, independently developed implementations of) encryption within the filesystem rather than using well managed code sitting on top or beneath the it.

4
0

People are still using btrfs?

After the RAID5/6 issue which still isn't fixed a *year* later(!), people are still trusting their data to btrfs?

For small storage, there are loads of options (and boot from ZFSoL is still new enough to be a concern, if not a blocker)

For huge storage, you probably aren't using ZFS - you're looking at cluster filesystems (gluster, ceph, hdfs etc)

For medium scale storage, ZFS is hard to beat. Work out a way to get ZFS on Linux compliant (even if that is to reverse engineer it) and move on.

6
1

Re: People are still using btrfs?

> After the RAID5/6 issue which still isn't fixed a *year* later(!), people are still trusting their data to btrfs?

Umm, the RAID5 issue which isn't fixed correctly *since the beginning of the project*.

The devs have known of conditions which will corrupt RAID5 since the start, and while there was a promising bug fix a while ago, they then found it only fixed one of the bugs, but others are known.

The people doing btrfs have known about these issues for some time, and they never get properly fixed.

Most likely, that is why RH is dropping support for it.

4
0
Silver badge

Re: People are still using btrfs?

"For huge storage, you probably aren't using ZFS - you're looking at cluster filesystems (gluster, ceph, hdfs etc)"

Guess what works best on the individual nodes undeneath the cluster?

I'm running Gluster on top of ZFS here. It works well.

2
0

Re: People are still using btrfs?

Bluestore?

It's probably not there yet, but it won't be much longer, so, no, zfs isn't needed (and is also not recommended by the ceph folks).

0
0
Anonymous Coward

Re: People are still using btrfs?

Bluestore is going mainstream with the new version of SUSE Enterprise Storage 5 coming out next month (Ceph based for those unaware).

I’ve seen some early pre-beta performance data (disclaimer: I work at Suse). Beats anything we were able to get out of gluster on the same kit hands down!

1
0
Silver badge

Anyone else just use ext4?

Seems to have worked fine for us for years.

2
2
Silver badge

Re: Anyone else just use ext4?

Yepp, it's the default on RHEL/CentOS 6 and that doesn't have systemd so yes I still use a lot of ext4.

3
0
Happy

Re: Anyone else just use ext4?

Ah, I thought I was the only one keeping to RHEL/CentOS 6 to avoid the systemd crap. I'm using a mix of ext4 and xfs on those systems. :)

4
0
Silver badge

Re: Anyone else just use ext4?

I use ext4 at home, and always thought I might someday switch to btrfs when it became the default in Fedora. Guess I'm going to continue to stick with ext4, I see no benefit in switching to ZFS or XFS.

2
0
Silver badge

Re: Anyone else just use ext4?

Ext4 locally, ZFS on my fileserver.

My fileserver snapshots my few TB or RaidZ3 every minute. If I've set it up right, there's no remote admin login, so you need physical access to delete snapshots.

I cryptolockered the lot from a throwaway VM attached via NFS and it was possible to rapidly recover every single file from snapshots... I didn't even need to restore anything from backup.

ZFS is marvellous... Let's just get the licence issue resolved...

1
0
Silver badge
Trollface

Btrfs

Doesn't systemd support Btrfs properly then?

We'll all have to go back to using EMACS.

0
0

JFS!

My attitude toward BTRFS has long been: "This is very promising. We can use it now? It doesn't seem like it's a simple drop-in replacement for ext4... I'll wait for others to try this Kool-Aid." Seems like it does not taste so good after all.

In the mean time, I'm happy with JFS on my system. And will take the label odd-ball.

1
0

the King is dead, long live the King

So RedHat officially is stepping down from market area where Solaris is reigns.

It is really funny to see final prove that all people who been telling in last decade that Solaris soon will be dead/legacy OS where wrong :)

0
2
Unhappy

Sadly, if it had to stop here

0
0

RH looking at a different solution "Stratis"

Asking about this elsewhere it looks like Red Hat are doing work on the "Stratis Storage Project" .

This seems to be a bit of a management system that will allow you to emulate pretty much all features of an next generation file system using existing layers (LVM, MD , XFS). But adding things like block level checksumming to MD/LVM to allow the equivalent of individual file check sums. The argument seemed to be to, building this in a single layers like BTRFS and ZFS is too hard. The layering allows you to make the programming/debugging more manageable. I guess the key would be communication between layers, bad block checksum tells XFS the file is corrupt etc.

Details here:

https://fedoraproject.org/wiki/Changes/StratisStorage

https://stratis-storage.github.io/StratisSoftwareDesign.pdf

They also seem to have some interest in "BcacheFS". A "Bcachefs" developer says there are fundamental issues with the BTRFS design:

"btrfs, which was supposed to be Linux's next generation COW filesystem - Linux's answer to zfs. Unfortunately, too much code was written too quickly without focusing on getting the core design correct first, and now it has too many design mistakes baked into the on disk format and an enormous, messy codebase "

https://www.patreon.com/bcachefs

Not sure the truth of this, I don't know enough about it.

2
0

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Forums

Biting the hand that feeds IT © 1998–2017