Reply to post:

This storage startup dedupes what to do what? How?

Frumious Bandersnatch

Well then, you're going to have to start doing your own research. This is the last time I'm going to spoon-feed you.

I have presented my research.

For a given hash & block size, there are a finite number of blocks that will cause collisions in a given hash. By removing some of that finite set, we have fewer potentials for collision. It is that simple

Yes, but your argument was about git, not fixed-sized block. I have pointed out that we are not dealing with finite sets there. Thus, your counting argument is fallacious.

It is clear that collisions are a problem in the general case

And equally clearly (actually, more so), I gave you the equation for quantifying the collision rate and outlined a simple proof that the error rate can be made arbitrarily small for practical input parameters.

I don't know why you have such a problem with understanding this.

I also don't know what is your hang-up with this "but in the general case" line of argument. We agree on the pigeonhole principle (hash collisions must exist) and I think we can both agree that the analysis of the birthday paradox is apropros. That is the general case and I'm confident that I've analysed it correctly and that it vindicates my argument. Of course, I left some small amount of work for you to do to verify that what I said is correct, but that's simple high-school maths.

If you do decide to argue further, please make it clear whether you're arguing about git or block-level hashing. And don't try to bring a fallacious argument from one (ie, git) across to bolster your argument (such that it is) in the other. Thank you.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon