Never lose data?
I've learned to never say never around the IT world as Murphy is just a heartbeat away.
Six MIT research boffins have demonstrated a system capable of recovering all data in the event of a crash that was previously constrained to high-end theory. The team will October showcase the first albeit slow file system "mathematically guaranteed" to not lose data during crashes. Authors Haogang Chen; Daniel Ziegler; Tej …
Not strictly "boy", but "Out in the Fields" could be rewritten to fit...
"Stored in the fields,
the data's just begun.
Out on the disks,
Records build one by one.
Wait! Back it up!
A thousand files could die each day.
Murphy's just a heartbeat away."
With apologies to Gary Moore and Phil Lynott, obviously
"With a lot of "mathematically proven" systems you end up moving the problems/bugs from the implementation process to the initial specifications, which are often not 100% complete nor correct for anything of reasonable complexity."
I think "moving" is a bit misleading here. Providing the proof is correct (!!) the code will fully comply with the spec, therefore all the remaining bugs will be in the spec. ;)
Having said that I don't think I've seen a bug-free spec, formal or otherwise. Formal specs do have an advantage in that you can prove that stuff complying with the spec will have particular properties though. While this kind of work may be viewed as esoteric or irrelevant, it should yield a useful model for other folks to compare/apply to real world file systems.
ZFS is paranoid about hardware and connections, and can have multiple layers of redundancy and storage; so why the need for yet another file system?
No system will never lose data if bad enough stuff happens locally, this is why sensible enterprise solutions include replication of data between multiple data centres too.
A while back I was tasked with implementing a block-level data storage solution intended for enterprise use (though not actually a filing system). The brief handed to me was a hand-scrawled description of several data structures with arrows pointing to them. Data integrity was addressed by the sentence at the very end: "+ data protection features".
I was very glad to leave that job.
This post has been deleted by its author
I have not had a system crash corrupt "normal" data in years (there were a handful of occasions where we were using McAfee disk encryption on laptops and the encryption got lost.... but a full disk decrypt allowed us to run a checkdisk and not only was data recovered but the system booted perfec'.
What I have had though is bloody new/recent disks dying, from power outages, from a slight knock well inside their G shock tolerances (running or stopped) or just going into the corner and never coming out of it's eternal sulk...
I guess this FS does not address that problem ;) But of course, everyone has backups don't they. Don't they? Hello? Hello ? Echo ? Echo echo...
The Be file system BFS (https://en.wikipedia.org/wiki/Be_File_System) was pretty good in the mid-90s - very fast, excellent querying capabilities and the journalling capabilities meant that a common demo was pulling the plug on the machine in mid operation and then bringing it up again and demonstrating that no data had been lost.
Silent corruption is quite a serious problem too (besides crashing disks or power outage). You remember the old VHS cassettes or old Amiga disks? Today you can not read any on them. Why? Bit rot. Data rots over time, cosmic radiation, etc. So you need checksums to detect flipped bits. Journaling wont catch silent corruption. Only ZFS does. Read the wikipedia article on ZFS to see some research on ZFS superior data corruption protection abilities (better than everything else).
BTRFS has no research confirming it does catch all types of silent data corruption. ZFS is confirmed in at least two separate research papers, that zfs do catch all types of silent data corruption. Read the Wikipedia for links to research groups examining zfs. Btw, phoronix last month lost an btrfs volume, and in the forum thread there are several similar stories. I wouldn't trust btrfs, but it's your data.
Not losing data written to disk is relatively simple, providing you don't screw the hardware - you make sure all data is findable / retrievable before the data itself is committed.
Software crashes are a bit harder - you need to ensure that you don't overwrite when writing new data, and that you can roll back to previous versions, or roll forward to recover from the incomplete write in a crash.
Power outages would be much better handled with more resilient components - e.g. all DIMMs come with at least as much solid state as RAM, and backup power to ensure the data is flushed to solid state during a failure; a disk has battery power to flush it's cache; a CPU has it's state written out. Then, when power is restored, each component just needs to restore itself to the known state, and you are good to go.