back to article EXT4 filesystem can EAT ALL YOUR DATA

Flaws have been found in the EXT4 filesystem that can cause data loss when running Linux 4.0 and higher. Reports such as this Debian bug report suggest “massive filesystem corruption” is the result of the flaw. The problem appears to strike RAID0 users, on Arch Linux and Debian. Fixes are available, one explained by Lukas …

  1. Anonymous Coward
    Anonymous Coward

    Yikes

    Thankfully I run either XFS or EXT3 (if I dual-boot on Windows) on my machines.

    That said, patching shouldn't take long.

    (Versus the last batch of patches Microsoft pushed out, my work laptop started patching Friday morning and was still telling me it was applying the updates and not to turn off my computer on Monday evening. If I hadn't hard-power-cycled the laptop, it'd still be patching a week later!)

  2. Antonymous Coward
    Stop

    NO NO NO!

    DO NOT UPDATE your kernel!

    The BUG is in "stable" while the FIX has yet to reach Linus.

    DO NOT UPDATE your kernel!

    It's (another) Neil Brown special in the software raid code.

    It:

    Only affects filesystems on software raid (md/raid0) on SSD

    Probably affects other filesystems.

    First appeared in a "bugfix" patch committed in April and has been percolating down through the "stable" branches.

    Is the consequence of a misdirected DISCARD instruction and can be avoided by disabling discard/trim or DOWNGRADING the kernel.

    DO NOT UPDATE your kernel!

  3. Michael Thibault
    Trollface

    If

    >If you want to hang on to your data. Which you probably do.

    If it's on a linux system, that is. Which it isn't necessarily. If you know what I mean.

    FTFY.

  4. MrZoolook
    Trollface

    But wait!

    Linux just works...

    1. Anonymous Coward
      Anonymous Coward

      Re: But wait!

      It does… has just worked for me longer than any Windows release has and I'll have a fix in place for this bug quicker than many commercial vendors will.

      1. Anonymous Coward
        Anonymous Coward

        Re: But wait!

        But wait!

        Commercial vendors aren't using ext4...

  5. Joe Montana

    RAID0?

    Surely anyone who's using RAID0 doesn't really care about the integrity of their data in the first place?

    1. Maventi

      Re: RAID0?

      Possibly, but just because loss is tolerable doesn't mean that corruption is. A system using RAID0 as high performance scratch volume may tolerate a complete outage (e.g. due to disk failure), whereas corrupt data could potentially go unnoticed until it breaks something further down the track.

      So the fact that this bug only affects RAID0 doesn't really mitigate its severity.

    2. Anonymous Coward
      Anonymous Coward

      Re: RAID0?

      True, but is RAID-10 also affected?

  6. Anonymous Coward
    Anonymous Coward

    A good reason to give Btrfs the swerve (I think it's still beta) as the odd bug still crops up in EXT4 after all this time.

    1. Anonymous Coward
      Anonymous Coward

      To be fair, this is nothing to do with BTRFS, which I've been watching from a distance for some time.

      I did give it a go on some testing servers at work in a Ceph cluster, running Ubuntu 12.04 LTS but it seemed things were not ready for prime time there, so I stuck with XFS for that, and most of our production stuff these days is XFS. Legacy production is EXT4, hopefully not for too much longer.

      As for BTRFS, I guess I'll transition across to it eventually, now that it's no longer "unstable" (it carried this label for a long time). Best bet is to ensure good backups, then you can recover if things go pear shaped.

      1. Anonymous Coward
        Anonymous Coward

        @stuart

        Btrfs is the default on Opensuse (with XFS) and when I last tried it was set to snapshot by default, quickly gobbling up space in a VM and was a right royal pain to switch it off as snapper was buggy to start with. Agreed it has lots of positives but there a few creases to be ironed out. I'll probably give it a spin on the next Suse release but only for test purposes.

        1. cmannett85

          @Mine's a Guinness

          Yeah the OpenSUSE BTRFS configuration is utterly shafted, every software update triggers a snaphot and changing the root config for snapper doesn't seem to do anything. I'll happily go back to ETX4 next time.

          1. fajensen
            Flame

            Ahh, Those Bright Young Things who hack Linux distributions these days .... They are working to make Linux more like Windows .

            1. Fatman
              WTF?

              RE: Those Bright Young Things

              They are working to make Linux more like Windows .

              Now that would not be a backhanded jab at the developers of systemd, now would it????

  7. Robert Carnegie Silver badge

    To me, the description looks corrupted

    And the name "Lukas Czerner" doesn't look error-free. I can't tell what it's meant to be though.

    Maybe I should not crassly mock the guy who, apparently, can eat all my data.

  8. arctic_haze
    Linux

    Status on Fedora

    It seems the bug is only in kernel 4.0.* versions and therefore affects only Fedora 22 (which is in a beta version) and some fresh kernel updates on Fedora 21. A patch is already available:

    https://bugzilla.redhat.com/show_bug.cgi?id=1223332

    Luckily for me I have only Fedora 20 boxes. I always prefer to wait six months before upgrading to the new version.

  9. Anonymous Coward
    Anonymous Coward

    There are other bugs as well. Often when I try to use multi-threaded code that has high memory usage the entire Linux OS will crash. There is a possible interaction between the threading system and the memory system in combined usage. If you just tested the threading system that would be fine. If you just tested the memory system that would be fine. I would guess a lot of the instability goes back to the usage of the C programming language and C RT library. There really does need to be simple, totally defined programming language for low level code. No language designer would be interested in creating that. They want everyone to know how smart they are, and you can't show off by making something basic.

    1. Chemist

      "Often when I try to use multi-threaded code that has high memory usage the entire Linux OS will crash. "

      Strange, even 10+ years ago using a dual xeon Dell workstation with the quite large memory of 2GB ( for the time) we ran multithreaded protein modelling software under RedHat for DAYS at ~100% CPU without any probs.

      And now my 8GB 4(8) core i7 laptop (OpenSUSE13.1) running multithreaded software can render/convert video running at ~90% CPU without issue.

  10. Rob Carriere
    Thumb Down

    Very misleading article

    Five minutes of proper research would have shown it is a bug in the MD RAID0 system that has nothing to do with EXT4 or any other file system.

    Here's the description from Eric Work, who wrote the fix:

    This bug affects systems with kernel 3.19.7+ or 4.0.2+ running any filesystem on top of MD RAID 0 that supports and enables TRIM. No other RAID levels are affected. I believe Intel fakeraid is also affected. If you don't use fstrim or have the 'discard' option enabled in fstab then you wouldn't be affected. Removing these TRIM options is also the workaround.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like