Write only memory appears in the wild once again.
"It is not a complete implementation since "you can create a SHA-256 repository, but will be unable to read it""
The Git version control system has moved closer towards using SHA-256 rather than the compromised SHA-1 for its hash algorithm, to help to protect code from tampering. Whenever code is committed into a Git repository, the software calculates and stores a hash value. When you retrieve the code, the hash is recalculated to …
Oh, there's lots of write only stuff ("memory" or otherwise) about.
In fact, just about any time you read about backups failing to restore after a major hardware/ransomware/etc disaster because no one ever thought to actually test them once in a while!
Ok firstly I'm not knocking them for trying to address it, but the chances of this actually happening are virtually zero. Yes I know some people will be quick to point out that "virtually" isn't good enough but in this case these hashes shouldn't be being used as a sole security measure anyway.
As far as I understand the hash is computed from a lot of data including the commit data (message, author, date, etc) and other data in the tree. The chances of generating 2 identical hashes whilst manipulating the contents of a file - which would have to be the same length anyway(?) - are so unlikely that I'm unsure why this is being treated as something that needs fixing.
The concept of hashes in Git was never "for security". It uses them primarily as a means to reference things in the same way a relational database does with unique primary keys.
Furthermore one of the advantages of using Git is the speed at which it works. I'm almost certain there'll be a performance hit with this especially if they're trying to make it backwards compatible.
Maybe when it was designed - but with more and more application relying on (public) repositories to get code they have to trust - it became far more than that - willingly or unwillingly.
I just wonder the design is so dependent on cryptographic algorithm they knew one day could have been superseded, as others had been before (MD5...)
Torvalds is all about good ideas, meh execution. Linux was (and is) entirely unoriginal, replicating even then decades-old and largely outmoded ideas about how to build an operating system (we could have something nice that wouldn't need to reboot with every change; yes, Tanenbaum was right). Git is a great concept but very, very rough at the edges. The UI is a mess (go through all the command line arguments and options and try to make sense of it) and any decent designer wouldn't have spared a second to conclude that yes, you do want to make that hash algorithm pluggable.
But, to be fair, everybody is abusing Git as just a faster SVN and that does not help a lot either. It was meant, like Mercurial, as a distributed version control system where the primary mode of interaction would be exchange of deltas between (mutually trusting) parties. I bet not 1% of Git users out there know how the email integration works, or even that you can quickly push a commit to a colleague over ssh or something. I do think that that makes hash weaknesses worse, because now there's a single point of attack (Github, Gitlab, Gitorious still around? You should be able to work without any of these)
"The chances of generating 2 identical hashes whilst manipulating the contents of a file - which would have to be the same length anyway(?) - are so unlikely that I'm unsure why this is being treated as something that needs fixing."
Why would the length, the commit message, or any of the other factors have to be identical? Who's going to notice that a five year old commit is slightly longer or has a slightly different message? The biggest problem is not making it disrupt any of the commits that are sitting on top of it.
Why is Linus acting like this is some MASSIVE change?
- add extra field in d/b for SHA256 hash
- write update script that for every file in a reposity, computes its SHA256 hash and writes to new field
- anything that would compute/check an SHA1 hash, continue to do so if the SHA256 field is missing, otherwise use it
Done. It'd take a weekend. Stop being useless.
So how it works is you have a hash that tells you which file in the directory is which.
when you change the hash function, you can no-longer tell which file is which.
This is more of a problem than just switching some function over, and git is fairly modular as is.
So I clone repo-A with hash-sha1, when I commit file b with hash-sha256 repo-A no-longer is able to tell that file b is already present.
This is why the article referred to write only repo. As the easy bit swapping out a hash function was done already. The hard bit, managing to still read all the existing git repos out there, still being worked on..
Exactly. I heard that git is a convoluted mixture of perl, bash and C. Happy debugging! Some people at Facebook were looking to migrate from svn to git, but as they required customization, it was easier for them to learn mercurial from scratch and then write extensions rather than get git to do their bidding.
Biting the hand that feeds IT © 1998–2020