Indeed
That is a very important distinction.
Given that they haven't said anything, I'm going to bet it's #2; because a good spin-doctor could turn #1 into a 100% success rate.
Engineers and computer scientists say they have devised a novel method for identifying authors of anonymous emails that's reliable enough to be used in courts of law. In a series of papers published over the past few years, the researchers from Concordia University in Montreal have described what they say is the first ever data …
And this is supposed to be good enough for use in court? Holy hell...
With a false positive rate of 20% on such a small sample it's next to useless for picking people out of the general population, surely? All you could hope to get is "this guy we already suspect writes in a similar style to the release", which has got to qualify for pretty weak circumstantial evidence at best.
80 percent is reliable enough for a court of law? 20 percent constitutes reasonable doubt in my mind.
A thesaurus and a list of common spelling and grammar mistakes tied into a random character / word replacement / transposition script ... I'm fairly sure someone has thought of this already.
Your move.
We start with "... that's reliable enough to be used in courts of law."
And end with "When finely tuned, the technique identified the author about 80 percent of the time." Out of a very limited set of 158 suspects, but with at least 200,000 emails to chew on.
This has no teeth.
I hope the 'engineers and computer scientists' behind this research involved linguists at an early stage. There are enough debates among experts in that field as to the authorship of various literary works to call this whole idea into question. For example, how much of the work attributed to Shakespeare was really penned by him? I suspect it is far to easy mimic the writing style of another person to make this stand up in court.
Whilst 80% for a single document wouldn't be good enough, suppose you have 4? All identified to the same author. That gives you a pretty low chance of the suspect not being the author. Building a case to a "beyond reasonable doubt" level involves multiple levels of evidence. In a harrassment case, your likely to have several emails (otherwise it wouldn't be harassment). That's the point of building a case - if they were all easy, police wouldn't have much to do. It's the same process we use in the brain to decide if something is true or not - we build evidence until we hit above our "truth threshold" and decide it's true.
Under criminal law the onus is for the prosecution to prove "beyond reasonable doubt" - I agree that is not met by this 80% accuracy.
However a civil case it is only "on the balance of probabilities" so anything over 50% is technically acceptable.
So good enough for a libel or copyright case .....
Oh shit.
You know that hocus-pocus personallity test of symmentrical ink-blots that all look like female genitalia, for which you need to answer what this most reminds you of so that you appear as 'normal' as possible to the testers, while trying not to snigger / not get a boner / not spoil the test-paper?
Well, the Rorschach Test is still being widely used in the US & Canada. Small wonder then that this email data-mining technique is also treated as credible evidence.
Based on my previous (named) postings, I wonder if this process would be able to divine my identity from this anonymous post? Ha!
I would see this being used as a tool to reduce the number of possibilities for an author
in the test they did where it got 80% right out of 200,000 it doesnt sound too useful but if you apply that figure to trying to identify the author of 5 emails amongst that 200,000 and the system comes back with the same name 4 times and a "no match" or different name for the fifth I know where I would concentrate other (human) resources
process it and then toss it out in whatever language style you wanted.
Some were really, really different. It's a bit like a word processor with a selectable grammar option - business, casual, etc.
Bet that would bugger up even that great university, Concordia. Great for untouched e-mails but undoubtedly beatable if need be.
Businesslike:
I'm concerned that this study doesn't seem to address the fact that people tend to change style depending on the target audience.
vs casual:
Doubtful. We speak differently to different people.
vs. Anonymous:
Bollocks. Do these retards think I'd post words like this if they who I was? *
* I know, not too hard to discover, but Anon posters often don't think that far ahead.
It can only tell you which of the suspects has a style most like the author of the email. Or put another way, if the author is not known, this method won't unmask them.
Essentially it could be argued that there is no way to prove that there isn't another person with a similar educational background and a similar writing style.