back to article How to feed and raise a Wikipedia robo-editor

Wikipedia is to put artificial intelligence to the enormous task of keeping the free, editable online encyclopaedia up-to-date, spam-free and legal. The Objective Revision Evaluation Service uses text-processing AI algorithms to scan recent edits for signs that they may be spam, an effort at trolling, part of a revert war ( …

  1. AndyS

    Despite everything this article says, it won't be possible to stop a good troll.

    It may be able to stop mindless vandalism, but the sort of trolling Wikipedia is famous for, the sort which gets facts in obituaries mixed up and credits the wrong people for the invention of irrelevant objects, cannot be spotted by a grammar or syntax checker.

    For example, it might well accept the statement "While an undergraduate, David Cameron a named author on a paper investigating the relationship between the use of Thorium in paper mills, and a statistically significant rise in autism of the children of the employees of those mills."

    On the flip side, if it really did get some intelligence and the ability to parse sentences for plausibility, it would probably reject the sentence "While an undergraduate, as part of a club initiation ceremony, David Cameron is believed to have put his penis in the mouth of a dead pig."

    So while this may marginally improve the readability, I can't see it doing much for the credibility.

    1. Just Enough

      Pardon?

      "For example, it might well accept the statement "While an undergraduate, David Cameron a named author on a paper investigating the relationship between the use of Thorium in paper mills, and a statistically significant rise in autism of the children of the employees of those mills.""

      I suspect it wouldn't, as that sentence makes no grammatical sense.

      1. AndyS

        Re: Pardon?

        Damn you, Skitt, and your grammar related laws.

    2. Michael Wojcik Silver badge

      Despite everything this article says, it won't be possible to stop a good troll.

      Good thing the article didn't claim it would be, then.

      It may be able to stop mindless vandalism, but the sort of trolling Wikipedia is famous for, the sort which gets facts in obituaries mixed up and credits the wrong people for the invention of irrelevant objects, cannot be spotted by a grammar or syntax checker.

      Good thing the algorithms in question have nothing to do with so-called "grammar or syntax checker[s]", then.

      Of course it's impossible to algorithmically classify all candidate texts as "genuine" or "bogus", because that's not a meaningful decision problem. This sort of complaint - it can't be perfect, so it's irrelevant - is a sophomoric commonplace.

      We can always depend on the Reg readership for voicing it in the comments section, though. Eternal September and all that.

  2. Yugguy

    Hmmm

    Could a robot tell the difference between disk jockey and cock jockey I wonder?

  3. Ken Hagan Gold badge

    Define false

    Is a false review (or wiki article) one that is factually inaccurate, or one that the author believes is factually inaccurate, or one that is factually accurate but the reader believes to be inaccurate.

    Or, this being prose, not mathematics, is it more than one of the above at the same time.

    Define your terms and *then* tell me whether your algorithm is 90% accurate.

  4. PNGuinn
    Go

    A new game ...

    If the great Jimbo's organ really does go down this route the "AI" will have to be carefully biased to consider the unbiased opinions of all the ruling elite. Which means that it should be possible to have enormous fun gaming the system ....

  5. aberglas

    No new, Yawn

    Wikipedia has had robots for many years. They do lots of stuff, like find broken or ambiguous links. Or update articles from databases such as the CIA world fact book. The also look for blatant spam. Put f**k in as an anon editor and it wont last long.

    The 'bots are slowly getting smarter, that is all. But when they can truly understand what is being written we will no longer need Wkipedia -- the age of humans will be over...

  6. Androgynous Cupboard Silver badge

    Artificial Intelligence

    Is better than no intelligence at all.

    1. DocJames
      Joke

      Re: Artificial Intelligence

      Trolls the world over disagree!

  7. Michael Wojcik Silver badge

    Human judges aren't very good either

    humans are excellent at making sense of the nuance of the written word

    No, in fact, we're not, at least by any objective metric. Of course much depends on what you mean by "making sense" - which is not a term of art in Natural Language Processing, or for that matter in linguistics or cognate fields.

    Human judges are wildly inconsistent when interpreting parole (specific instances of language use). That's been demonstrated in study after study. Indeed, the article goes on to refer to a couple, e.g. in identifying "false" online reviews.1

    There are micro-inconsistencies, due to ambiguities and other interpretive issues at the phrasal level; and there are systemic macro-inconsistencies in interpreting larger passages or entire texts. The history of literary criticism amply demonstrates that. Vast swathes of theory in various disciplines - linguistics, literature, translation theory, etc - document and discuss the issues.

    NLP techniques for classifying documents this way - between "probably genuine" and "suspect" - actually do the best when they don't try to emulate human judges. The article hints at that too, but without discussing the technology it's hard to give a useful picture. Algorithmic tools such as Support Vector Machines, Maximum Entropy Markov Models, and Latent Semantic Analysis are almost certainly quite different from whatever it is that human readers do; they produce usable results in these applications for just that reason. Bots and trolls are generally optimized for deceiving human judges,2 so classifiers that use radically different techniques are more likely to spot them.

    Anyone interested in further information on the topic might look up some of the presentations Bing Liu has online. He's one of the experts in finding false online reviews, which is an interesting area because it has strong economic consequences, so there's an active arms race going on. Wikipedia is more ideological axes being ground and lulz.

    1Here "false" generally means "referring to events which did not occur", as when paid shills or bots create positive reviews for one purveyor or negative reviews for competitors.

    2Or for performance, taking the "shotgun" approach of making more work than the evaluators can handle. But that's a different problem.

  8. andy k O'Croydon
    Terminator

    Text-reading robot

    Surely that should have been a picture of Johnny 5. Although I suppose WALL-E looks similar enough.

  9. DCLXV

    Excellent

    Looks like I'll be opening my wallet for Wikipedia once again

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like