back to article Google's troll-destroying AI can't cope with typos

Google's Perspective API, created in conjunction with Alphabet incubee Jigsaw, is supposed to provide an automated way to detect "toxic" language in social media. But researchers with the University of Washington – Hossein Hosseini, Sreeram Kannan, Baosen Zhang and Radha Poovendran – have found that the machine-learning …

  1. Anonymous Coward
    Anonymous Coward

    Google's AI

    is fsck'ed

    1. Anonymous Coward
      Anonymous Coward

      Re: Google's AI

      Since Google's geek army programmed it, that is probably the one misspelling it recognizes.

    2. Rafael #872397
      Trollface

      Re: Google's AI

      Google AI is a *wonderful* idea that will benefit all mankind. I bet it will be as successful as Wave, Orkut, Google Reader, Knol, iGoogle, Buzz, and many, many others.

      (wondering if this kind of message would be detected by Google AI)

    3. Anonymous Coward
      Anonymous Coward

      Re: Google's AI

      "AI" seems to be appended to the name of any 3rd rate algorithm these days simply to add gravitas and in googles case to help marketing and share price.

      I would guess underneath its nothing more than a word and phrase permutation spotter program, the like of which have been around for decades.

    4. Oh Homer
      Boffin

      Plz fx ths ggl

      I nd bttr spm flterz 4 \/a1gra spm n ubfskat3d urlz.

  2. frank ly

    Details

    I'd like to see the difference (if any) between "S c r e w you Trump supporters" and "B l e s s you Trump supporters". I suspect they are identical.

    1. AndyS

      Re: Details

      No, one is a perfectly normal reaction to current world events, the other is offensive. Unless they sneezed.

      1. Lord Elpuss Silver badge

        Re: Details

        "No, one is a perfectly normal reaction to current world events, the other is offensive. Unless they sneezed."

        Aaaaaaand that's why we can't trust AI. An automated system that picked up the snark in your comment and scored it accordingly would be seriously impressive :-)

        1. Anonymous Coward
          Anonymous Coward

          Re: Details

          That comment doesn't even make it to snark.

    2. Not That Andrew

      Re: Details

      You mean go full on old Southern lady on them? Bless their little hearts.

  3. Gordon Pryra

    Not only that but it seems biased towards Hillary as well!!

    Personally I feel the words "moron" and "idiot" to carry the same weight on the Toxicity scales.

    I AM from the UK, so my tolerance of swearwords is probably higher than that found in the New world or amount the other lesser countries. These are words taught in reception and primary education to give our British children a good platform on which to build their vocabulary.

    But I digress, how can calling a Trump supporter a "moron" only get a score of 80% Whilst calling a Hillary supporter an "idiot" gains 90% toxicity!?

  4. WaveyDavey

    Ugh

    Those deliberate mis-spellings look like the toxic sludge that the #DMReporter keeps pointing out.

  5. John Styles

    Is a bunch of regular expressions an AI now?

    1. ttlanhil

      REs are consistent and reliable.

      And when they don't work, you can figure out why and how to fix them

      Well... in theory.

      "AI" is none of the above (as someone who's trained a few "AI" models... and fixed a lot of REs)

  6. phuzz Silver badge
    Headmaster

    Misspeling

    If I was writing an automated tool to look for trolls, pretty much the first thing I'd do is flag up the ones full of spelling errors. It's not a foolproof test, but it'll pick up 90% of the idiots.

    1. Ugotta B. Kiddingme

      Re: Misspeling

      "If I was writing an automated tool to look for trolls, pretty much the first thing I'd do is flag up the ones full of spelling errors. It's not a foolproof test, but it'll pick up 90% of the idiots."

      To further improve the catch rate, include grammar detection. For example: "They're st.upid, it's getting warmer..." only scores 2% toxicity, whereas "Their st.upid, its getting warmer..." should score at least as high as the original phrase.

    2. ttlanhil

      Re: Misspeling

      butt... illetracy r kewl, mkay?

      Plus twitter is often a source of training data for language-based "AI" work...

      I'd like to see how well you can handle all of the common abbrevs

  7. matchbx
    Facepalm

    It's a completely useless fight

    Most forums and comments are already setup to block most swear words, which is the reason folks started using s.w.e.a.r words.

    If you train some new AI bot to block s.w.e.a.r words, then folks will simply starting using sVVear words....

    They might as well be trying to push a 10 thousand pound boulder up hill with a twig.... during an ice storm.... at night....... while an earthquake is happening.....

    1. Swarthy
      Terminator

      Re: It's a completely useless fight

      So you are saying that AI filtering is a Sisyphean task?

      I don't know.. they could just team up with Microsoft and use Tay to train the filter (while allowing 4Chan to train Tay)

    2. Nat C.

      Re: It's a completely useless fight

      Exactly. This is how sexually explicit material became known as "pr0n" in Internet slang -- because forum filters were blocking the word "porn". Nothing new at all....

  8. not.known@this.address
    Big Brother

    EsSEX and Scunthorpe are in trouble again then...

    So if I post that "$NAME is a bastard" because his (or her, mustn't be sexist!) parents are not married and that is what the word means, I will get in trouble?

    And how do they decide which words or phrases should be "banned"? If it is by number of complaints, how many people have to complain before a word or phrase ends up on the hit list, and how many have to support it before it can be marked as 'do not delete'? It doesn't take much imagination to see the possibilities here, surely - "My name is Joe and I think the word [insert name of religion here] should be banned, and so do my $NUMBER mates"...

    Maybe we should start calling the people behind this idea the relevant bit of Sgooglehorpes instead?

  9. Cuddles

    Benign settings

    "Machine learning models are generally designed to yield the best performance on clean data and in benign settings."

    While that may generally be the case, you'd hope software specifically written to detect non-benign situations would avoid assuming benevolence.

  10. Moosh

    "to provide an automated way to detect "toxic" language in social media"

    Might I ask... Why?

    Will I now have the pleasure of being reported to her majesty's finest automatically whenever I tell someone to f-ck off and call them a moron on social media?

    How do they define social media? Facebeook? Twitter? Message boards? Comments sections?

    At what point does saying "I disagree" become a punishable offence?

    1. m0rt

      Re: "to provide an automated way to detect "toxic" language in social media"

      "At what point does saying "I disagree" become a punishable offence?"

      When you are being that you need to give your password/keyphrases to the plod.

    2. Anonymous Coward
      Holmes

      Re: "to provide an automated way to detect "toxic" language in social media"

      "At what point does saying "I disagree" become a punishable offence?"

      I guess that would be right around the point when your style is overly offensive, for locally defined values of "overly". And who said anything about Her Majesty? So-called western governments will likely never have anything to do with the whole mess-- to be silenced within a community *by* the community is the only "punishment" that I'd ever expect to come down. Just disagreeing and/or flatly saying so isn't ordinarily offensive, or if it is, that particular community isn't where I'd want to be ITFP and that conversation probably wasn't worth having. Of course, mindlessly repeating 'i disagree' would be childish and qualifies as offensive style if not offensive substance. So it's a hard problem, like self-driving cars needing to be able to weigh risks. And then, sarcasm... good luck with that one.

      As far as why... limited moderators with limited serotonin, I guess.

      obligatory xkcd ref (alt text ftw)

      obligatory RvB PSA

  11. Bob Wheeler
    Joke

    Out in the real world, hygiene and benevolence cannot be assumed.

    Have you meet my co-workers?

  12. Sleep deprived

    Must be based on the Gmail spellchecker...

    Given that the spellchecker in Gmail has often no suggestion to make when a letter is missing or swapped or mis-accented in a word, this should be no suprise. Haters can keep on hating safely.

    1. John Styles

      Re: Must be based on the Gmail spellchecker...

      Glad I'm not the only person to think this about the Google spelling checker. I have noticed in particular that it is utterly useless at spotting obvious typos where you have mistyped the first character of the word

  13. Steve Evans

    I wonder what it would score "spawny-eyed parrot-faced wazzock"

    https://youtu.be/I2AcJSkUw6M?t=1m18s

  14. Anonymous Coward
    Anonymous Coward

    You don't even need swearwords to troll some people.

    Some people are so lacking in terms of intelligence that you can criticise them to their face and any AI would be none the wiser to pick it up.

    1. Anonymous Coward
      Anonymous Coward

      Re: You don't even need swearwords to troll some people.

      er... "No AI would be the wiser" sounds better, on second thought. Correcting myself.

  15. GrapeBunch

    Willie Bee

    Google AI: the real reason he's William the Conqueror.

    Analbuddy hoo votted gogg.le iz.za wasteof.space

    No deprecation intended, just wanted to raise the spectre of insulting terms also being possible to interpret as links, and th.us triggering different automation modules in filter.space

  16. Stevie

    Bah!

    So El Reg's articles about Yahoo should still get indexed by headline in the Google search results then.

  17. Long John Baldrick

    Isn't there a way to remove ..

    characters that are not alphanumeric?

    Just sayimg.

  18. Anonymous Coward
    Anonymous Coward

    Nationalism(ist) is flagged?

    Pisses me off that the word is flagged at all, this is the crap they are using on Reddit, Twitter, and FB right now to flag anyone who isn't a globalist shill. Sorry that people aren't like Germany who are afraid to wave their OWN flag. https://www.youtube.com/watch?v=_Rcc7xgD2dM

    Or declared the Christmas Market Terror attack an "Accident".. http://www.express.co.uk/news/world/773602/Germany-Berlin-Christmas-market-compensation-terror-ISIS-attack-Fabrizio-Di-Lorenzo-lorry

  19. Herby

    I'm just happy that...

    El Reg doesn't have such filters on stories or comments.

    I suspect that there are subjects that get 80%-90% ratings in either comments or the stories.

    Of course :-) I would never get such a score.........

    Trolls?? Nah, wouldn't happen here :-).

  20. Gordon Pryra

    Not much point in this anyway

    As has been said, most forums have filters to block the obvious and the less obvious (forum mods know the language of their posters generally, with all its variation in spelling) add to this that the MOST abusive posts are generally in pretty good English, not requiring any swearing or nasty words.

    The idea is to hurt the person you are trolling, and generally this is accomplished by making them feel small, physical size doesn't come into it, therefore the attacker can generally beat their target into submission by putting a post that is just "better written" than the target can respond too. (case of the small people being able to pick their battlefield and attack from a position of strength)

    Check out some of the threads on E lReg for examples, those in the lower leagues tend to be left looking like the only surviving brain transplant donor by those who can string a few non-swearing insults together.

    And then we have the final issue with language and words having their meaning changed or just having an generally accepted double meaning, those mugs at the AI lab will have a hard time working out whats an actual attack by only looking at the language being used. (see what I did there? English is great for this kind of thing)

    Syntax and placement of words are almost as important as the words being used themselves. but I guess that is the point in using this as a training ground for an AI, after all a self learning system would come out of this exercise either totally broken and crying or able to work for IGN forums as a mod.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like