back to article Data-mining technique outs authors of anonymous email

Engineers and computer scientists say they have devised a novel method for identifying authors of anonymous emails that's reliable enough to be used in courts of law. In a series of papers published over the past few years, the researchers from Concordia University in Montreal have described what they say is the first ever data …

COMMENTS

This topic is closed for new posts.

Page:

    1. A J Stiles
      Pint

      Indeed

      That is a very important distinction.

      Given that they haven't said anything, I'm going to bet it's #2; because a good spin-doctor could turn #1 into a 100% success rate.

  1. David Hicks

    So 80% of the time it can pick between a known list of 158 people?

    And this is supposed to be good enough for use in court? Holy hell...

    With a false positive rate of 20% on such a small sample it's next to useless for picking people out of the general population, surely? All you could hope to get is "this guy we already suspect writes in a similar style to the release", which has got to qualify for pretty weak circumstantial evidence at best.

  2. Daniel 20
    Pint

    in courts of law ... ?

    and it gets it right 80% of the time ... ?

    Sounds great!

  3. Anonymous Coward
    Anonymous Coward

    Whoa there!

    80 percent is reliable enough for a court of law? 20 percent constitutes reasonable doubt in my mind.

    A thesaurus and a list of common spelling and grammar mistakes tied into a random character / word replacement / transposition script ... I'm fairly sure someone has thought of this already.

    Your move.

  4. Matt 21

    about 80 percent of the time

    Good enough for a court of law? I hope not.

  5. Anonymous Coward
    Anonymous Coward

    80% ?

    Excellent news for spam fighters. Not really feasible for court, I'd say.

  6. Notas Badoff
    Boffin

    Between beginning and end - no meat for the court

    We start with "... that's reliable enough to be used in courts of law."

    And end with "When finely tuned, the technique identified the author about 80 percent of the time." Out of a very limited set of 158 suspects, but with at least 200,000 emails to chew on.

    This has no teeth.

  7. Anonymous Coward
    Anonymous Coward

    Linguist involvement?

    I hope the 'engineers and computer scientists' behind this research involved linguists at an early stage. There are enough debates among experts in that field as to the authorship of various literary works to call this whole idea into question. For example, how much of the work attributed to Shakespeare was really penned by him? I suspect it is far to easy mimic the writing style of another person to make this stand up in court.

  8. Tim 54
    Thumb Up

    Not so bad

    Whilst 80% for a single document wouldn't be good enough, suppose you have 4? All identified to the same author. That gives you a pretty low chance of the suspect not being the author. Building a case to a "beyond reasonable doubt" level involves multiple levels of evidence. In a harrassment case, your likely to have several emails (otherwise it wouldn't be harassment). That's the point of building a case - if they were all easy, police wouldn't have much to do. It's the same process we use in the brain to decide if something is true or not - we build evidence until we hit above our "truth threshold" and decide it's true.

  9. Ian Stephenson
    Boffin

    I don't see it specifying "criminal court"

    Under criminal law the onus is for the prosecution to prove "beyond reasonable doubt" - I agree that is not met by this 80% accuracy.

    However a civil case it is only "on the balance of probabilities" so anything over 50% is technically acceptable.

    So good enough for a libel or copyright case .....

    Oh shit.

  10. Anonymous Coward
    Anonymous Coward

    80%???

    80% probability is enough for American courts?

    That's pathetic!

  11. Anonymous Coward
    Grenade

    Rorschach Test alert!

    You know that hocus-pocus personallity test of symmentrical ink-blots that all look like female genitalia, for which you need to answer what this most reminds you of so that you appear as 'normal' as possible to the testers, while trying not to snigger / not get a boner / not spoil the test-paper?

    Well, the Rorschach Test is still being widely used in the US & Canada. Small wonder then that this email data-mining technique is also treated as credible evidence.

    Based on my previous (named) postings, I wonder if this process would be able to divine my identity from this anonymous post? Ha!

    1. Anonymous Coward
      Anonymous Coward

      I would say Oliver 7

      But I'm not sure.

  12. Jess--

    not good enough for court

    I would see this being used as a tool to reduce the number of possibilities for an author

    in the test they did where it got 80% right out of 200,000 it doesnt sound too useful but if you apply that figure to trying to identify the author of 5 emails amongst that 200,000 and the system comes back with the same name 4 times and a "no match" or different name for the fifth I know where I would concentrate other (human) resources

  13. Anonymous Coward
    Anonymous Coward

    Known answer

    So they only get 80% correct when they know the answer, what's the hit rate like before they have manipulated the answers?!

    I have a horse tipping algorithm with a 100% success rate, only problem is that the race has to have already been run in order for it to work.

  14. Anonymous Coward
    Anonymous Coward

    Even better

    They claim they can identify the person talking in an ENCRYPTED VOIP connection!

    http://ncfta.ca/papers/voip.pdf

  15. JaitcH
    Unhappy

    There was a web site somewhere that would take inputted text ...

    process it and then toss it out in whatever language style you wanted.

    Some were really, really different. It's a bit like a word processor with a selectable grammar option - business, casual, etc.

    Bet that would bugger up even that great university, Concordia. Great for untouched e-mails but undoubtedly beatable if need be.

  16. tfewster
    FAIL

    Concerns/doubtful/bollocks

    Businesslike:

    I'm concerned that this study doesn't seem to address the fact that people tend to change style depending on the target audience.

    vs casual:

    Doubtful. We speak differently to different people.

    vs. Anonymous:

    Bollocks. Do these retards think I'd post words like this if they who I was? *

    * I know, not too hard to discover, but Anon posters often don't think that far ahead.

  17. Anonymous Coward
    Thumb Down

    Useless...

    It can only tell you which of the suspects has a style most like the author of the email. Or put another way, if the author is not known, this method won't unmask them.

    Essentially it could be argued that there is no way to prove that there isn't another person with a similar educational background and a similar writing style.

Page:

This topic is closed for new posts.

Other stories you might like