Actually, I think The Onion might have the inside track on that decision.
1933 posts • joined 8 Nov 2007
Bar bitch you ate?
Why should I feel smug? Because I "won" because you decided to rage-quit? Please ...
Well then, you're going to have to start doing your own research. This is the last time I'm going to spoon-feed you.
I have presented my research.
For a given hash & block size, there are a finite number of blocks that will cause collisions in a given hash. By removing some of that finite set, we have fewer potentials for collision. It is that simple
Yes, but your argument was about git, not fixed-sized block. I have pointed out that we are not dealing with finite sets there. Thus, your counting argument is fallacious.
It is clear that collisions are a problem in the general case
And equally clearly (actually, more so), I gave you the equation for quantifying the collision rate and outlined a simple proof that the error rate can be made arbitrarily small for practical input parameters.
I don't know why you have such a problem with understanding this.
I also don't know what is your hang-up with this "but in the general case" line of argument. We agree on the pigeonhole principle (hash collisions must exist) and I think we can both agree that the analysis of the birthday paradox is apropros. That is the general case and I'm confident that I've analysed it correctly and that it vindicates my argument. Of course, I left some small amount of work for you to do to verify that what I said is correct, but that's simple high-school maths.
If you do decide to argue further, please make it clear whether you're arguing about git or block-level hashing. And don't try to bring a fallacious argument from one (ie, git) across to bolster your argument (such that it is) in the other. Thank you.
OK, so I'm going back on my promise to not write any more, but ...
1. Does git's normal input type lead to fewer collisions?
Your line of reasoning about the structure of git's input leading to fewer collisions is pure conjecture. There is no "ipso facto" about your conclusions. You say that subspacing the input space discards potential collisions, but you neglect to consider that it also removes many non-colliding inputs, too. For your argument to work you'd have to explain why subspacing preferentially removes more colliding inputs, proportionately speaking.
In fact, the input to hash algorithms are (in the class of) infinite strings, so the normal rules of logic when dealing with finite numbers simply don't apply. Half of an infinite space is still an infinite space. In those terms, it's hard to see how your counting argument can hold any water on this point.
2. Is assuming hashes are collision-free a "reasonable" assumption in a block-centric app?
I believe that it is, but you have to plug the block size and digest size into the formulas for the birthday paradox so that you get an acceptable collision rate for a given capacity/occupancy (ie, how many blocks you intend to store).
A simple analysis should be able to convince you that doubling the number of possible hash buckets (ie, adding one more bit to the digest) will more than halve the collision rate for practical occupancy levels (which is a minuscule fraction of the hash space). This takes it out of the realm of being a question of pure mathematics and turns it into an applied maths/engineering question. And that's why I'm saying that it's an entirely reasonable assumption to make if you pick the right hash size for the problem at hand.
The actual formula to determine the probability that there is no collision, with a hash of h bits and an occupancy level o is:
(2 h Po) / 2 ho.
where xPy is the Permutation function.
I submit, without proof, that as h increases (with o fixed), the P term will increase faster than twice the rate of increase in the other term, assuming that o is sufficiently small. (in fact, it should increase much faster). Thus we can make the collision probability (1 minus the above) arbitrarily small by making a linear increase in the number of hash bits.
Basing the assumption that a well-chosen digest algorithm will be (practically speaking) collision-free on the above is, I am sure, completely justified.
Re: I'm a little confused
The analogy did put me in mind of dictionary-based compression (including LZW), too, as well as Merkle trees (for the sync problem).
The idea of re-using dictionaries sounds good, but you need good algorithms and data structures to make it tractable. You can't just blindly try compressing all blocks against all dictionaries to find the best compression ratio. That would take forever. I think that some clever tree-based/recursive construction (probably along with Bloom filters or maybe Rice-Golomb encoding) and some dynamic programming could make this work.
OK, this is my final post on the question, Vic. I promise.
I understand better now the point you're making about "entropy". You're saying that, eg, if I hashed all the natural numbers, represented in ascii, in a range (let's say they're zero-padded up to some minimum block length to make a fair comparison) that there will be fewer collisions than if the blocks all contained random 8-bit data. I think that a well-designed hash won't have significantly different collision rates over the two kinds of data for a given capacity range (ie, the number of messages to be stored).
And I'm, saying that, in the general case, [that the risk is "vanishingly small enough"] is simply not true
Mathematically speaking, you're absolutely right. Any hash can and will have collisions as proved by the counting argument/pigeonhole principle (or birthday paradox). All I'm saying is that in practice assuming that the same hash implies the same contents can be a reasonable engineering assumption. As evidenced by how git does it.
The assumptions that git makes on the actual hash size (160 bits) and expected number of commits (and, if I were to accept it, a factor to account for "entropy") aren't going to hold for a massive block-based de-dupe system, but you can plug the numbers into some formulas to find the expected collision rate and choose your digest size to make the risk "low enough" that it's not worth worrying about (eg, a mean time to collision of 50 years, if that's what you want).
I don't know what you're trying to say, there, Vic. By your logic, git having less entropy in the source should imply less entropy in the output hashes, which would in turn imply more collisions.
Anyway, entropy is irrelevant since the point of a good (cryptographic) message digest is to decorrelate patterns in the input from those in the output hash. Entropy measures over the set of output hashes should be the same regardless of input (again, for a "good" hash algorithm).
I'm just making the point that while you can't get away from the pigeonhole principle, if you have vastly more holes than pigeons, you can put some sort of bound on what's an acceptable probability of collision and design your application with the assumption that this risk is "vanishingly small enough".
It's all a trade-off, like Trevor explained in the article and in his post above.
That is not reasonable.
The git tool uses large digests (sha-1) to identify all commits. This is a 160-bit hash and apparently you'd need 2^80 random messages to get a 50% chance of a collision.
I recall reading that git hasn't any safeguards or double-checks to see if two commits hash to the same value. So Linus was just trusting the maths and making a calculated risk. For his purposes, at least, the chance of collision was so vanishingly small that's it's a risk that he was willing to take (for the sake of simplifying the code, one assumes).
I get what you're saying about the pigeonhole principle, it's not always unreasonable to assume that collisions won't happen.
From what I've read, these guys used no encryption at all.
Is "bandwagonesque" something other than a word to describe the minds of politicians (or a Teenage Fanclub album, obviously)?
"kids like legos"
It's "Lego", not "Legos".
Re: Actually I think there should be some form of quality control
To avoid random spamming by the desperate.
I've often thought of exactly the opposite. I should get off my arse(*) one of these days and set up "bewildering-upvotes.com" (**) where each week members vote on the most boring/anodyne post on their favourite site and then everyone goes and gives it a thumbs up. Perfect for sad gits everywhere who fancy themselves as anarchists, but who are too polite to do anything naughty.
* and probably sit back down again if I want to do web coding
** motto: "Dada's not dead; it just smells a little funny" perhaps?
Re: Keep your secret secret
you'll have a million different possibly keys
A 20-bit key is eminently crackable using brute force, especially if the file is of a known type. (<insert shell one-liner calling 'file' program on each one here>)
What you need to do is use "cryptographically secure" RNG like reading /dev/random (which may give the game away) or gather entropy from the running system in much the same way that the kernel and other RNG seeding functions do. Definitely don't use something like the time as the sole value for seeds.
It seems to make sense to have some kind of weight limit, but I'm wondering whether 1kg is a good cut-off? It seems a little on the low side if someone wanted to try their hand at building their own (with Pi and Arduino, say) rather than buying a kit or pre-built model. It seems that an extra 250 to 500 grams would be more practical in these cases.
Re: So this uses an already scarce RAM slot on the MoBo?
HOW DOES THIS WORK DAMMIT?
If they fit in a RAM slot then presumably they look just like regular RAM to the OS.
There are a few ways it could be used:
* reserve it the same way as you would a block of physical memory and have something like /proc/kcore so that userland processes can mmap parts of it (use it like regular memory)
* make it look like a block device that you can put partitions on and mount and so on (treat it like a disk drive)
* keep it all for the kernel and use it transparently in some subsystems like software raid, filesystem journal, disk cache, etc.)
"I promise not to kill anyone" (was Re: Stupid idea)
Killing it if it tries to do one is a terrible idea
Not necessarily. Look at Erlang. There, a major plank in the philosophy for handling errors is to just let the process die, along with any linked processes. Mind you, it also has the concept of supervisors that can restart processes after a shutdown. You have to actively work with this paradigm and code your application to take advantage of this, though. You can't just bolt it onto existing programs.
As for assert(), they do you no good if the code has a buffer overflow or something that wasn't explicitly trapped and the bad guys get to run arbitrary code. The pledge() idea gets around this nicely as injected code is sandboxed and probably won't be able to do much damage.
As for something managing to deliberately kill your anti-virus, the AV vendor can just use the idea from Erlang: let the affected daemon/process die, have a supervisor watching for any termination, log it and restart the process. If it consistently crashes on a given input file, mark that file as possibly containing malware and skip it on subsequent runs. Though really, I don't expect AV software to be crashing on arbitrary input files...
The main downside to this, as I see it, is that it does no good against viruses that are aware of how it works. Simply trace the call to pledge(), replace it with a jump to the viral code and there's no barrier to doing whatever "unsafe" operation it wants. Ditto running any untrusted program, which definitely won't call pledge().
I can see the reason for wanting this to be a system call rather than an external "rights" database. It's a lot simpler, easier to support and very lightweight. It isn't a complete solution, but for certain things (code injection) it's rather a neat defence.
A post with words
It's hard to see how this can succeed
1. I'm sure that technical documentation was available, both from AMD and review sites
2. Marketing bumf (not direct from Intel) often counts hyperthreaded "cores" as full cores
3. I'm sure that he had the opportunity to return the part as not matching what was offered, but declined to do so
4. As the article points out, what constitutes a core isn't well defined (and perhaps main ALU and supporting stuff does count)
5. Probably hard to prove AMD intended to mislead (though perhaps plaintiff doesn't have to go that far)
6. Balancing actual loss versus what he's claiming, did the loss of half an FPU core per "real" core really affect him that much (who saturates their FPU units in desktop/laptop PCs anyway?)
It's not nice when you buy something that doesn't live up to expectations, but seriously, I think this guy doth protest too much.
"presumably only until the police move in"
As I've read elsewhere, there's nothing illegal about remote access tools in most places. I think it needs to be pointed out that the version of the software being sold is distinct from the viral form, which uses the toolkit.
It's trivial to find the site selling "omnirat" and browsing the site (with lynx/w3m/in a VM) lists the features as basically only providing a remote administration tool. No mention of exploits or nefarious installation routes. For comparison, I can still download back orifice from sourceforge (for free).
Implying that the plods should/will take down the vendor's site just demonstrates a fundamental misunderstanding of the above, IMO.
"world leading oversight arrangements"
For if this negligee is fit for the UK queen, then other leaders would scarce deign but to try it on.
For relaxing times
Make it Suntory time.
Walking in the forest when ferns are releasing spores
I think you're talking about bracken? A lot of sources confirm that the spores are indeed carcinogenic, but I also came across this page that “Regarding bracken mentioned in your newsletter, if you are worried about breathing spores, we understand now that bracken doesn't spore in this country [the UK] and its expansion is only by vegetative growth."
Unfortunately, I couldn't find any other corroborating evidence to support that, but check out the article on UK's Cancer Research site, which does suggest wearing masks on dry, hot days, but that it's "unlikely [to be] a major cause of cancer".
Re: So What?
To combat extremism you must first know the mentality of an extremist and in order to know that you must talk to extremists
Exactly this. It reminds me of this rather excellent article about what ISIS is, what it wants and how it intends to get there.
The thing is, much of what's in that article is stuff that I'd never seen discussed on TV. Like how central the idea of a caliphate (and ISIS's self-identification as primarily a religious movement) is, or how ISIS justifies beheadings and other acts of terror (bizarrely, they're seen as a way to avoid bloodshed by hastening everyone's conversion to their brand of Islam), and why direct assault on their territory would actually play into their hands (they feel that it would legitimise their claim of being a valid caliphate).
There are several controversial people interviewed in that article and together they provide a compelling picture of who ISIS are and why they are doing what they're doing. Imagine a situation (like the one reported on here) where even talking with these people could cause journalists themselves to be subjected to police attention and have their livelihoods threatened (what else was on that laptop? would any other person be willing to talk to them again, even on unrelated issues?). In fact, this is the classic example of the "chilling effect" you sometimes read about. How can a free press and proper democratic processes survive when talking to the wrong person results in state-sponsored searches, confiscation of your property and the threat of imprisonment (via not handing over encryption keys, assuming they exist)?
Re: To which I say....
You foisted margarine
Margarine wasn't created for health benefits. It was created "for the armed forces and lower classes" as a substitute for butter [Wikipedia]. I don't think that anyone ever claimed it was healthy.
As Steve Wright said, "Eagles may soar, but weasels don't get sucked into jet engines"
I can only assume that Harding would fail to see the satire in the above.
Re: @Elmer Phud ....now we see ex-squaddies with PTSD trying to get shelter in doorways....."
I knew that Mr. Kipling made exceedingly good cakes, but I never knew he was such a good spinner of yarns.
Re: Calm Your Mind
I know... hence "last week".
A quick shufti in the Perliscope and a foolproof pattern for winning entries stood revealed: [-a-zA-Z0-9,!.:;()'"?/ ]+
"See? There is no brace", repeated Doolittle one more time. Teaching the laser cannon phenomenology was a lot harder than he had imagined.
Re: Calm Your Mind
Sirsasana? That sounds sooo last week.
Nobody could accuse Charles Atlas of being a Luddite after seeing how graciously he yielded to Google Maps.
"I don't know what the heck a "hard disk" is, but if that's what baby wants, then I'm the guy to get it for ya!"
Sadly, all that most VW engines had to look forward to in the aftermath of the emissions scandal was a life of servitude as robot butlers.
Wow. Proctology sure has advanced in the last 100 years.
Fritz Lang's Metropolis, where men were men and women had integrated toast racks.
Disintegrating eye, eh? Wait til you get a load of my smashing arm!
"Dutch" Schulz stood admiring his creation. "Finally," he thought, "I'll never have to stick this thumb in a dyke ever again"
Pierce Brosnan gets a few last-minute acting tips prior to starring in Treehouse of Horror XII.
This "Dolly" he'd ordered from russianbrides.ru sure wasn't like all the other broads.
You think I've got a big bicep, eh? Well whey don't you try playing Counterstrike with a thumb stick this big?
Let's get one thing straight, Glados: you do the working, and I do the ordering!
You got it wrong, bud, I'm gonna probe you later.