* Posts by Frumious Bandersnatch

1933 posts • joined 8 Nov 2007

What the world needs now is Pi, sweet $5 Raspberry Pi Zero

Frumious Bandersnatch
Silver badge

Actually, I think The Onion might have the inside track on that decision.

0
0

Thai women drugged punters 'with Xanax-spiked nipples' – cops

Frumious Bandersnatch
Silver badge

that's a

Bar bitch you ate?

5
0

This storage startup dedupes what to do what? How?

Frumious Bandersnatch
Silver badge

Obligatory

https://xkcd.com/386/

1
0
Frumious Bandersnatch
Silver badge

Why should I feel smug? Because I "won" because you decided to rage-quit? Please ...

0
0
Frumious Bandersnatch
Silver badge

Well then, you're going to have to start doing your own research. This is the last time I'm going to spoon-feed you.

I have presented my research.

For a given hash & block size, there are a finite number of blocks that will cause collisions in a given hash. By removing some of that finite set, we have fewer potentials for collision. It is that simple

Yes, but your argument was about git, not fixed-sized block. I have pointed out that we are not dealing with finite sets there. Thus, your counting argument is fallacious.

It is clear that collisions are a problem in the general case

And equally clearly (actually, more so), I gave you the equation for quantifying the collision rate and outlined a simple proof that the error rate can be made arbitrarily small for practical input parameters.

I don't know why you have such a problem with understanding this.

I also don't know what is your hang-up with this "but in the general case" line of argument. We agree on the pigeonhole principle (hash collisions must exist) and I think we can both agree that the analysis of the birthday paradox is apropros. That is the general case and I'm confident that I've analysed it correctly and that it vindicates my argument. Of course, I left some small amount of work for you to do to verify that what I said is correct, but that's simple high-school maths.

If you do decide to argue further, please make it clear whether you're arguing about git or block-level hashing. And don't try to bring a fallacious argument from one (ie, git) across to bolster your argument (such that it is) in the other. Thank you.

0
0
Frumious Bandersnatch
Silver badge

OK, so I'm going back on my promise to not write any more, but ...

1. Does git's normal input type lead to fewer collisions?

Your line of reasoning about the structure of git's input leading to fewer collisions is pure conjecture. There is no "ipso facto" about your conclusions. You say that subspacing the input space discards potential collisions, but you neglect to consider that it also removes many non-colliding inputs, too. For your argument to work you'd have to explain why subspacing preferentially removes more colliding inputs, proportionately speaking.

In fact, the input to hash algorithms are (in the class of) infinite strings, so the normal rules of logic when dealing with finite numbers simply don't apply. Half of an infinite space is still an infinite space. In those terms, it's hard to see how your counting argument can hold any water on this point.

2. Is assuming hashes are collision-free a "reasonable" assumption in a block-centric app?

I believe that it is, but you have to plug the block size and digest size into the formulas for the birthday paradox so that you get an acceptable collision rate for a given capacity/occupancy (ie, how many blocks you intend to store).

A simple analysis should be able to convince you that doubling the number of possible hash buckets (ie, adding one more bit to the digest) will more than halve the collision rate for practical occupancy levels (which is a minuscule fraction of the hash space). This takes it out of the realm of being a question of pure mathematics and turns it into an applied maths/engineering question. And that's why I'm saying that it's an entirely reasonable assumption to make if you pick the right hash size for the problem at hand.

The actual formula to determine the probability that there is no collision, with a hash of h bits and an occupancy level o is:

(2 h Po) / 2 ho.

where xPy is the Permutation function.

I submit, without proof, that as h increases (with o fixed), the P term will increase faster than twice the rate of increase in the other term, assuming that o is sufficiently small. (in fact, it should increase much faster). Thus we can make the collision probability (1 minus the above) arbitrarily small by making a linear increase in the number of hash bits.

Basing the assumption that a well-chosen digest algorithm will be (practically speaking) collision-free on the above is, I am sure, completely justified.

0
0
Frumious Bandersnatch
Silver badge

Re: I'm a little confused

The analogy did put me in mind of dictionary-based compression (including LZW), too, as well as Merkle trees (for the sync problem).

The idea of re-using dictionaries sounds good, but you need good algorithms and data structures to make it tractable. You can't just blindly try compressing all blocks against all dictionaries to find the best compression ratio. That would take forever. I think that some clever tree-based/recursive construction (probably along with Bloom filters or maybe Rice-Golomb encoding) and some dynamic programming could make this work.

0
0
Frumious Bandersnatch
Silver badge

OK, this is my final post on the question, Vic. I promise.

I understand better now the point you're making about "entropy". You're saying that, eg, if I hashed all the natural numbers, represented in ascii, in a range (let's say they're zero-padded up to some minimum block length to make a fair comparison) that there will be fewer collisions than if the blocks all contained random 8-bit data. I think that a well-designed hash won't have significantly different collision rates over the two kinds of data for a given capacity range (ie, the number of messages to be stored).

And I'm, saying that, in the general case, [that the risk is "vanishingly small enough"] is simply not true

Mathematically speaking, you're absolutely right. Any hash can and will have collisions as proved by the counting argument/pigeonhole principle (or birthday paradox). All I'm saying is that in practice assuming that the same hash implies the same contents can be a reasonable engineering assumption. As evidenced by how git does it.

The assumptions that git makes on the actual hash size (160 bits) and expected number of commits (and, if I were to accept it, a factor to account for "entropy") aren't going to hold for a massive block-based de-dupe system, but you can plug the numbers into some formulas to find the expected collision rate and choose your digest size to make the risk "low enough" that it's not worth worrying about (eg, a mean time to collision of 50 years, if that's what you want).

0
0
Frumious Bandersnatch
Silver badge

I don't know what you're trying to say, there, Vic. By your logic, git having less entropy in the source should imply less entropy in the output hashes, which would in turn imply more collisions.

Anyway, entropy is irrelevant since the point of a good (cryptographic) message digest is to decorrelate patterns in the input from those in the output hash. Entropy measures over the set of output hashes should be the same regardless of input (again, for a "good" hash algorithm).

I'm just making the point that while you can't get away from the pigeonhole principle, if you have vastly more holes than pigeons, you can put some sort of bound on what's an acceptable probability of collision and design your application with the assumption that this risk is "vanishingly small enough".

It's all a trade-off, like Trevor explained in the article and in his post above.

1
0
Frumious Bandersnatch
Silver badge

That is not reasonable.

The git tool uses large digests (sha-1) to identify all commits. This is a 160-bit hash and apparently you'd need 2^80 random messages to get a 50% chance of a collision.

I recall reading that git hasn't any safeguards or double-checks to see if two commits hash to the same value. So Linus was just trusting the maths and making a calculated risk. For his purposes, at least, the chance of collision was so vanishingly small that's it's a risk that he was willing to take (for the sake of simplifying the code, one assumes).

I get what you're saying about the pigeonhole principle, it's not always unreasonable to assume that collisions won't happen.

0
0

Hillary Clinton: Stop helping terrorists, Silicon Valley – weaken your encryption

Frumious Bandersnatch
Silver badge

What?

From what I've read, these guys used no encryption at all.

Is "bandwagonesque" something other than a word to describe the minds of politicians (or a Teenage Fanclub album, obviously)?

4
0

It's come to this for IBM: Watson is now a gimmick app on the iPhone

Frumious Bandersnatch
Silver badge

"kids like legos"

It's "Lego", not "Legos".

http://english.stackexchange.com/questions/10839/legos-not-lego

29
2

Commentard achieves bronze badge, goes directly to jail

Frumious Bandersnatch
Silver badge
Paris Hilton

Re: Actually I think there should be some form of quality control

To avoid random spamming by the desperate.

I've often thought of exactly the opposite. I should get off my arse(*) one of these days and set up "bewildering-upvotes.com" (**) where each week members vote on the most boring/anodyne post on their favourite site and then everyone goes and gives it a thumbs up. Perfect for sad gits everywhere who fancy themselves as anarchists, but who are too polite to do anything naughty.

* and probably sit back down again if I want to do web coding

** motto: "Dada's not dead; it just smells a little funny" perhaps?

3
0

Thanks for playing: New Linux ransomware decrypted, pwns itself

Frumious Bandersnatch
Silver badge

Re: Keep your secret secret

you'll have a million different possibly keys

A 20-bit key is eminently crackable using brute force, especially if the file is of a known type. (<insert shell one-liner calling 'file' program on each one here>)

What you need to do is use "cryptographically secure" RNG like reading /dev/random (which may give the game away) or gather entropy from the running system in much the same way that the kernel and other RNG seeding functions do. Definitely don't use something like the time as the sole value for seeds.

3
0

Irish roll out obligatory drone register

Frumious Bandersnatch
Silver badge

It seems to make sense to have some kind of weight limit, but I'm wondering whether 1kg is a good cut-off? It seems a little on the low side if someone wanted to try their hand at building their own (with Pi and Arduino, say) rather than buying a kit or pre-built model. It seems that an extra 250 to 500 grams would be more practical in these cases.

2
2

Riding on the memory bus: Micron brings out 8GB flash DIMM

Frumious Bandersnatch
Silver badge

Re: So this uses an already scarce RAM slot on the MoBo?

HOW DOES THIS WORK DAMMIT?

If they fit in a RAM slot then presumably they look just like regular RAM to the OS.

There are a few ways it could be used:

* reserve it the same way as you would a block of physical memory and have something like /proc/kcore so that userland processes can mmap parts of it (use it like regular memory)

* make it look like a block device that you can put partitions on and mount and so on (treat it like a disk drive)

* keep it all for the kernel and use it transparently in some subsystems like software raid, filesystem journal, disk cache, etc.)

0
0

Untamed pledge() aims to improve OpenBSD security

Frumious Bandersnatch
Silver badge
Terminator

"I promise not to kill anyone" (was Re: Stupid idea)

Killing it if it tries to do one is a terrible idea

Not necessarily. Look at Erlang. There, a major plank in the philosophy for handling errors is to just let the process die, along with any linked processes. Mind you, it also has the concept of supervisors that can restart processes after a shutdown. You have to actively work with this paradigm and code your application to take advantage of this, though. You can't just bolt it onto existing programs.

As for assert(), they do you no good if the code has a buffer overflow or something that wasn't explicitly trapped and the bad guys get to run arbitrary code. The pledge() idea gets around this nicely as injected code is sandboxed and probably won't be able to do much damage.

As for something managing to deliberately kill your anti-virus, the AV vendor can just use the idea from Erlang: let the affected daemon/process die, have a supervisor watching for any termination, log it and restart the process. If it consistently crashes on a given input file, mark that file as possibly containing malware and skip it on subsequent runs. Though really, I don't expect AV software to be crashing on arbitrary input files...

The main downside to this, as I see it, is that it does no good against viruses that are aware of how it works. Simply trace the call to pledge(), replace it with a jump to the viral code and there's no barrier to doing whatever "unsafe" operation it wants. Ditto running any untrusted program, which definitely won't call pledge().

I can see the reason for wanting this to be a system call rather than an external "rights" database. It's a lot simpler, easier to support and very lightweight. It isn't a complete solution, but for certain things (code injection) it's rather a neat defence.

3
0

Drones are dropping drugs into prisons and the US govt just doesn't know what to do

Frumious Bandersnatch
Silver badge

Bolas

A post with words

0
0

AMD sued: Number of Bulldozer cores in its chips is a lie, allegedly

Frumious Bandersnatch
Silver badge

It's hard to see how this can succeed

1. I'm sure that technical documentation was available, both from AMD and review sites

2. Marketing bumf (not direct from Intel) often counts hyperthreaded "cores" as full cores

3. I'm sure that he had the opportunity to return the part as not matching what was offered, but declined to do so

4. As the article points out, what constitutes a core isn't well defined (and perhaps main ALU and supporting stuff does count)

5. Probably hard to prove AMD intended to mislead (though perhaps plaintiff doesn't have to go that far)

6. Balancing actual loss versus what he's claiming, did the loss of half an FPU core per "real" core really affect him that much (who saturates their FPU units in desktop/laptop PCs anyway?)

It's not nice when you buy something that doesn't live up to expectations, but seriously, I think this guy doth protest too much.

7
6

OmniRAT malware scurrying into Android, PC, Mac, Linux systems

Frumious Bandersnatch
Silver badge

"presumably only until the police move in"

As I've read elsewhere, there's nothing illegal about remote access tools in most places. I think it needs to be pointed out that the version of the software being sold is distinct from the viral form, which uses the toolkit.

It's trivial to find the site selling "omnirat" and browsing the site (with lynx/w3m/in a VM) lists the features as basically only providing a remote administration tool. No mention of exploits or nefarious installation routes. For comparison, I can still download back orifice from sourceforge (for free).

Implying that the plods should/will take down the vendor's site just demonstrates a fundamental misunderstanding of the above, IMO.

4
0

In-a-spin Home Sec: 'We won't be rifling through people's web history'

Frumious Bandersnatch
Silver badge

"world leading oversight arrangements"

For if this negligee is fit for the UK queen, then other leaders would scarce deign but to try it on.

1
0

Kevin Spacey becomes pitch-hound for Swiss security bods

Frumious Bandersnatch
Silver badge

For relaxing times

Make it Suntory time.

2
0

Post-pub nosh neckfiller special: The WHO bacon sarnie of death

Frumious Bandersnatch
Silver badge

Walking in the forest when ferns are releasing spores

I think you're talking about bracken? A lot of sources confirm that the spores are indeed carcinogenic, but I also came across this page that “Regarding bracken mentioned in your newsletter, if you are worried about breathing spores, we understand now that bracken doesn't spore in this country [the UK] and its expansion is only by vegetative growth."

Unfortunately, I couldn't find any other corroborating evidence to support that, but check out the article on UK's Cancer Research site, which does suggest wearing masks on dry, hot days, but that it's "unlikely [to be] a major cause of cancer".

1
0

Cops use terror powers to lift BBC man's laptop after ISIS interview

Frumious Bandersnatch
Silver badge

Re: So What?

To combat extremism you must first know the mentality of an extremist and in order to know that you must talk to extremists

Exactly this. It reminds me of this rather excellent article about what ISIS is, what it wants and how it intends to get there.

The thing is, much of what's in that article is stuff that I'd never seen discussed on TV. Like how central the idea of a caliphate (and ISIS's self-identification as primarily a religious movement) is, or how ISIS justifies beheadings and other acts of terror (bizarrely, they're seen as a way to avoid bloodshed by hastening everyone's conversion to their brand of Islam), and why direct assault on their territory would actually play into their hands (they feel that it would legitimise their claim of being a valid caliphate).

There are several controversial people interviewed in that article and together they provide a compelling picture of who ISIS are and why they are doing what they're doing. Imagine a situation (like the one reported on here) where even talking with these people could cause journalists themselves to be subjected to police attention and have their livelihoods threatened (what else was on that laptop? would any other person be willing to talk to them again, even on unrelated issues?). In fact, this is the classic example of the "chilling effect" you sometimes read about. How can a free press and proper democratic processes survive when talking to the wrong person results in state-sponsored searches, confiscation of your property and the threat of imprisonment (via not handing over encryption keys, assuming they exist)?

10
1

Bacon can kill: Official

Frumious Bandersnatch
Silver badge

Re: To which I say....

You foisted margarine

Margarine wasn't created for health benefits. It was created "for the armed forces and lower classes" as a substitute for butter [Wikipedia]. I don't think that anyone ever claimed it was healthy.

2
4

TalkTalk attack: 'No legal obligation to encrypt customer bank details', says chief

Frumious Bandersnatch
Silver badge

weasel words

As Steve Wright said, "Eagles may soar, but weasels don't get sucked into jet engines"

I can only assume that Harding would fail to see the satire in the above.

0
0

El Reg revisits Battle of Agincourt on 600th anniversary

Frumious Bandersnatch
Silver badge

Re: @Elmer Phud ....now we see ex-squaddies with PTSD trying to get shelter in doorways....."

I knew that Mr. Kipling made exceedingly good cakes, but I never knew he was such a good spinner of yarns.

0
2

Caption this: WIN a 6TB Western Digital Black hard drive with El Reg

Frumious Bandersnatch
Silver badge

Re: Calm Your Mind

I know... hence "last week".

0
0
Frumious Bandersnatch
Silver badge

A quick shufti in the Perliscope and a foolproof pattern for winning entries stood revealed: [-a-zA-Z0-9,!.:;()'"?/ ]+

0
0
Frumious Bandersnatch
Silver badge

"See? There is no brace", repeated Doolittle one more time. Teaching the laser cannon phenomenology was a lot harder than he had imagined.

1
0
Frumious Bandersnatch
Silver badge
Happy

Re: Calm Your Mind

Sirsasana? That sounds sooo last week.

0
0
Frumious Bandersnatch
Silver badge

Nobody could accuse Charles Atlas of being a Luddite after seeing how graciously he yielded to Google Maps.

0
0
Frumious Bandersnatch
Silver badge

"I don't know what the heck a "hard disk" is, but if that's what baby wants, then I'm the guy to get it for ya!"

0
0
Frumious Bandersnatch
Silver badge

Sadly, all that most VW engines had to look forward to in the aftermath of the emissions scandal was a life of servitude as robot butlers.

0
0
Frumious Bandersnatch
Silver badge

Wow. Proctology sure has advanced in the last 100 years.

0
0
Frumious Bandersnatch
Silver badge

Fritz Lang's Metropolis, where men were men and women had integrated toast racks.

0
0
Frumious Bandersnatch
Silver badge

Disintegrating eye, eh? Wait til you get a load of my smashing arm!

0
0
Frumious Bandersnatch
Silver badge

I gave you one thumbs up. I think maybe it deserves another

0
0
Frumious Bandersnatch
Silver badge

"Dutch" Schulz stood admiring his creation. "Finally," he thought, "I'll never have to stick this thumb in a dyke ever again"

0
0
Frumious Bandersnatch
Silver badge

Pierce Brosnan gets a few last-minute acting tips prior to starring in Treehouse of Horror XII.

0
0
Frumious Bandersnatch
Silver badge

This "Dolly" he'd ordered from russianbrides.ru sure wasn't like all the other broads.

0
0
Frumious Bandersnatch
Silver badge

You think I've got a big bicep, eh? Well whey don't you try playing Counterstrike with a thumb stick this big?

0
0
Frumious Bandersnatch
Silver badge

My mate Anatoli Bugorski assured me this is perfectly safe.

0
0
Frumious Bandersnatch
Silver badge

Let's get one thing straight, Glados: you do the working, and I do the ordering!

0
0
Frumious Bandersnatch
Silver badge

You got it wrong, bud, I'm gonna probe you later.

1
0

Bacon as deadly as cigarettes and asbestos

Frumious Bandersnatch
Silver badge

Ne vous inquiétez pas

Maintenant ils disent que la glace est vraiment bonne pour la santé.

1
0

Forums