back to article Potato, potato. Toma6to, I'm going to kill you... How a typo can turn an AI translator against us

Neural-network-based language translators can be tricked into deleting words from sentences or dramatically changing the meaning of a phrase, by strategically inserting typos and numbers. Just like twiddling pixels in a photo, or placing a specially crafted sticker near an object, can make image-recognition systems mistake …

  1. Rich 11 Silver badge

    Hmmm

    Smoe mistkaes are eaiser to corerct tahn otehrs. For a human, anyway.

    1. Duncan Macdonald Silver badge

      Re: Hmmm

      Good one!

      In nature it is rare to have perfect information - an animal that can correctly distinguish a predator from the background has a big survival advantage. Therefore handling incomplete or corrupted data (eg a cougar partially hidden behind a rock) became a necessity. This is why human brains can do general pattern recognition far better than current AI systems.

      1. Charles 9 Silver badge

        Re: Hmmm

        But doesn't that work BOTH ways, and that's why say zebras have stripes--to throw off the pattern recognition of predators like lions?

        1. Loud Speaker

          Re: Hmmm

          Zebras have stripes to throw off tsetse flies (or test files, depending on your AI).

        2. JLV Silver badge

          Re: Hmmm

          good point.

          but still, you’d figure lions’ pattern matching would have evolved to

          lunch = regex.match(stripes) by now.

          I wonder how you cook up an AI translator that doesn’t bat an eye at a word like ‘Psy6hothearpeiut’. maybe that’s the price you have to parse online dating profiles like ‘I hope your a happy man. I am a clever women’.

      2. Terry 6 Silver badge

        Re: Hmmm

        Also Kanheman's work indicating that we use two modes of decision making/thinking. The fast one to make instant, emotionally directed, almost instinctive, decisions and the slow, more reasoned one. https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow

      3. Anonymous Coward
        Anonymous Coward

        Re: Hmmm

        "This is why human brains can do general pattern recognition far better than current AI systems."

        Except on website form captchas..

    2. jgarbo

      Re: Hmmm

      The human brain is a 3,000,000 yr old computer, refined by evolution. The "computer" is a 50 yr old machine refined by humans. The computer's catching up very fast. Soon it'll not only out think us but out talk us. Don't relax on your hubris sofa.

      1. tiggity Silver badge

        Re: Hmmm

        I'll be impressed when a computer can match the capabilities of the "average" person (i.e. do not need to be a world beating chess player, artist etc) - mental and physical (taking up same volume as a human with similar robot body) and using similar amount of energy as a person and learning in same way a child does (i.e. "self learning" - working it out for themselves in conjunction with environmental interactions some teaching process from others).

        A "jack of all trades", able to respond to changing situations computer controlled robot is the true comparison, a software / hardware combo designed for a specific task (and using lots of power / hardware) is not.

        1. John H Woods Silver badge

          Re: Hmmm

          tiggity: "I'll be impressed when a computer can match the capabilities of the average person"

          You'll be waiting a while. I'll be impressed when they can match the capabilities of the average member of the Corvid family (crows, ravens, jays, magpies).

          1. Cynic_999 Silver badge

            Re: Hmmm

            It's a strange fact that if the first and last letters of a word are correct, the order of the rest of the letters doesn't matter too much, the human brain will interpret it correctly.

            1. Alterhase
              Happy

              Re: Hmmm

              I former colleague of mine said that, because she was dislexic,it did not make much difference to her when reading if the letters within a word were out of sequence....

    3. big_D Silver badge

      Re: Hmmm

      Google doesn't even need spelling mistakes, or it didn't used to.

      English -> German is very dodgy with Google.

      (NOTE: The following example now works, because I uploaded the correct translations a couple of years back)

      I had to do a quick translation of a handbook I'd written in English into German. I thought I could save a little time and use Google Translate to get the rough text translated and just tidy it up...

      The problem is, Google Translate has real problems with formal English. Abbreviated English is fine, but formal caused it to ignore the negatives:

      "Do not open the case, high voltage inside" -> "Das Gehäuse öffnen, Starkstrom drinnen"

      "Don't open the case, high voltage inside" -> "Das Gehäuse nicht öffnen, Startstrom drinnen"

      Or even funnier:

      "Do not open the case, no user serviceable parts inside" -> Das Gehäuse öffnen, nicht drinnen"

      (Open the case, nothing inside)

      There was nothing with the spelling mistakes, just it would ignore certain words, like "not", although why "no user serviceable parts inside" translates to "nothing inside" is anyone's guess.

      1. Terry 6 Silver badge

        Re: Hmmm

        I can make an ejicated guess that " user serviceable"+noun doesn't translate to anything that Google's software can cope with. It's a slightly strange idiom.

        1. cream wobbly

          Re: Hmmm

          If Google's translation AI had a more linguistically informed parser instead of treating them as strings of letters between whitespace, it wouldn't make such fundamental errors.

          The problem will come in about a decade, when they *start* solving that: correct dialectal and idiomatic usage will be simplified to the lowest (American) common (Webster) denominator (slang). By the time they fix it to be more advanced, the damage will be done to the language by pushing our little monkeys to use their cloth mummies instead of going and asking a real teacher.

      2. Anonymous Coward
        Anonymous Coward

        Re: Hmmm

        "English -> German is very dodgy with Google.

        It also makes serious mistakes in translations to English from common European languages. It not only omits negatives sometimes - but drops significant words for no apparent reason.

        It often fails to translate fairly obvious German compound words - yet will happily translate the components if they are separated out by a space.

        It appears to have no idea of context. So a church "mass" (German 'messe') usually comes out as a "fair".

        1. Tony W

          Re: Hmmm

          This is standard Google machismo: they will never admit to not knowing. If Google Maps doesn't recognise an address it will take you somewhere else with a vaguely similar name. If Google Translate isn't sure of how a word fits into the sentence it will silently leave it out of the translation, which is much worse as it isn't always obvious.

    4. DavCrav Silver badge

      Re: Hmmm

      "Smoe mistkaes are eaiser to corerct tahn otehrs."

      Google translate went with 'Smoe-Fehler sind leichter zu korrigieren.' Apart from 'smoe', it could also correct your spelling mistakes. Must try harder!

    5. Primus Secundus Tertius Silver badge

      Re: Hmmm

      I once read an article that described how each word retained the original first and last letters, but the others were listed in alphabetical order. In this case:

      Some maeiksts are eaeisr to cceorrt tahn oehrts. For a hamun aanwyy.

      1. Jason Bloomberg Silver badge

        Re: Hmmm

        Written English has a huge amount of redundancy -

        Sm mstks r sr t crct thn thrs. Fr hmns nywy.

        Adding in anything extra to help with context or clarification makes it progressively easier to understand. And speaking what's written with a Scottish accent often helps in my experience.

        1. cream wobbly

          Re: Hmmm

          Same matchsticks ruin sir at cricket thine theirs. Far hymns naughtilywanky.

    6. Mage Silver badge

      Re: Hmmm

      Also it's more proof the term "AI" is marketing. Grammar checkers and Spelling checkers don't seem any better than 30 years ago.

      Machine translation is rubbish to translate a book or even a half page of technical info. It's really bad to translate your own language to one for a different native speaker.

      AI can't even correct OCR or proof read new writing. You need an experienced human proof reader.

  2. Alpc

    Rubbish in, rubbish out...

    ...at the end of the day, AI is only as good as the human programmers and algorithm creators. Humans make mistakes and, as a consequence, so will AI, unless humans cover all scenarios. But is this possible? Could AI end up creating its own scenarios?

    1. big_D Silver badge

      Re: Rubbish in, rubbish out...

      Also, I always hear people praising Google Translate, MS Translate and various other tools and services, but they are all doing English <-> Spanish or English <-> French, with a bit of Chinese thrown in for luck.

      All of them make a horrible pig's ear of English <-> German.

      1. Primus Secundus Tertius Silver badge

        Re: Rubbish in, rubbish out...

        @big_D

        Yet the roots of the English language are Germanic. I wonder how Google manages English <=> Dutch, since Dutch is the closest related language to English. My own impression of Dutch is that it is one third English, one third German, and one third that I cannot make out.

        It does seem from other comments that formal and informal versions of language need to be treated separately,

        1. big_D Silver badge

          Re: Rubbish in, rubbish out...

          When I hear Dutch people talking, it always sounds like every other word is either English or German. I can usually understand what they are saying, but I can't speak Dutch.

        2. JLV Silver badge

          Re: Rubbish in, rubbish out...

          ook een beetje Frans

          http://www.ezglot.com/etymologies.php?l=nld&l2=fra

        3. cream wobbly

          Re: Rubbish in, rubbish out...

          "My own impression of Dutch is that it is one third English, one third German, and one third that I cannot make out."

          All three are common descendants of Proto-Germanic. English is built on the same structure as Dutch, but has a ton of Scandinavian influence in spelling, and a lorry load of Romance vocabulary and idioms added. Dutch has Gothic and Spanish to thank for its whack pronunciation. German grammar was reformed multiple times so it would be taken seriously like Latin.

        4. Anonymous Coward
          Anonymous Coward

          Re: Rubbish in, rubbish out...

          One third English, one third German, and one third the Norwegian dialect of Jæren, south of Stavanger. You need to be a native of the area to realise it.

          https://en.wikipedia.org/wiki/J%C3%A6ren

      2. Alpc

        Re: Rubbish in, rubbish out...

        In my experience - I do quite a bit of translating - Google translate is OK for English to Italian and vice versa but the results do need reviewing and are rarely, if ever, "production" ready. You really need to know both languages in depth to be able to produce translations that are accurate and well written. The AI is getting better but it's far from perfect. Maybe it will be in 10 or so years.

        1. big_D Silver badge

          Re: Rubbish in, rubbish out...

          Agreed. I worked for a short time in a translation buro. My translations were readable, made sense, grammatically okay, but were a long way from what the trained translators were producing.

          And what I was producing was a thousand times better than what Google Translate was dishing up. As you say, it is a long way from being production ready. In most cases, at least with German, you could take about 15% of what it produced as usable text, the rest would need to be re-written from scratch.

  3. Anonymous Coward
    Anonymous Coward

    Grammar is also a problem as I found out when someone asked me to help my uncle jack off a horse.

    1. Paul Crawford Silver badge
      Thumb Up

      Or the Viagra instructions where I misread "take 30 minutes before sex" as "take 30, minutes before sex".

      1. Anonymous Coward
        Anonymous Coward

        "take 30, minutes before sex" .......

        that's actually a very hard mistake to make.

    2. jgarbo

      No problem in Surrey, big problem in Wichita.

  4. Paul Crawford Silver badge

    Spelling check?

    It seems the fundamental flaw appears to be they are trying to map the character stream in one language to another by the neural net, rather than decoding the words first using some dictionary and then learning how to map sentence to sentence?

    Yes, OK, that is the easy way to set up a learning system so of course everyone would do that for speed...

    1. Loud Speaker

      Re: Spelling check?

      Just watch the subtitles on a popular news program to see daily demonstrations of this failure mode.

      1. Anonymous Coward
        Anonymous Coward

        Re: Spelling check?

        "Just watch the subtitles on a popular news program to see daily demonstrations of this failure mode."

        The BBC iPlayer "Super-squirrels" documentary this week had several typos in the subtitles. It probably wasn't a pure machine translation - as some words had a similar meaning but were totally different in sound and spelling.

        Other typos were not even credible as misheard words - they made no sense in the context of the subtitle.

      2. G7mzh

        Re: Spelling check?

        Just watch the subtitles on a popular news program to see daily demonstrations of this failure mode.

        I remember one instance on "Question Time" where, in the subject of legalisation of cannabis, the subtitling machine decided to produce "can of piss". I've got a photograph of it somewhere.

        I've always thoufght it a bit odd that the misprint is invariably a word or phrase less common than the corect one.

  5. Anonymous Coward
    Anonymous Coward

    To law enforcement using translation software

    (as they would, by then, legally read all your comms, let alone thoughts)

  6. Joe Werner
    FAIL

    Second example in the pic

    The translation (even the "correct" one) is still wrong...

    Learn (ok, machine-learn) the conjugation of the verbs, just because English does not have it (except for the 3rd person singular) it doesn't mean that other languages miss it as well! (the romanic languages have it at least)

    "sie hat sich zu einer Feministin entwickelt" should be "she turned into a feminist". The translation is grammatically wrong, claiming that "they turned into a feminist" (singular vs. plural), then it should have read "sie _haben_ sich ... entwickelt".

    And what is a "safe feminist"? Does not compute for me, and I am reasonably fluent in both English and German.

  7. Ian Emery Silver badge

    Nothing new here

    I found that out years and years ago; multiple online translation programs converted English "bread and water" in to Russian "bad food".

    Completely flummoxed my then Russian girlfriend.

    1. T. F. M. Reader Silver badge

      Re: Nothing new here

      @Ian Emery: I found that out years and years ago; multiple online translation programs converted English "bread and water" in to Russian "bad food".

      Actually, that is likely to be an artefact of an idiom rather than an adversarial typo: for a Russian "to sit/live on bread and water" means to be constantly hungry, either due to extreme poverty or due to some illness or another reason that does not permit one to eat normal food. The translation apparently preferred the idiom (and bungled it a bit) to the literal meaning.

      1. DavCrav Silver badge

        Re: Nothing new here

        "Actually, that is likely to be an artefact of an idiom rather than an adversarial typo: for a Russian "to sit/live on bread and water" means to be constantly hungry, either due to extreme poverty or due to some illness or another reason that does not permit one to eat normal food. "

        Also in English. 'Bread and water' tends to rarely mean actual bread and actual water.

        1. HandleAlreadyTaken

          Re: Nothing new here

          From a Romanian friend, here's a catastrophically bad Google translation: https://translate.google.com/translate?sl=ro&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=https%3A%2F%2Fwww.gustos.ro%2Fretete-culinare%2Fchec-cu-nuci-si-rahat.html&edit-text=

          "Rahat" is the Romanian word for Turkish delight. It is also an euphemism for excrement. Google chooses the idiom instead of the main meaning, with hilarious results.

          On the same page, Google's advice to " do the dick test to check if it's baking" should instead suggest to "do the toothpick test"..

          1. This post has been deleted by its author

    2. DiViDeD Silver badge

      Re: Nothing new here

      Wasn't there a machine translation doing the rounds years (OK, decades) ago about thow

      "The spirit is willing, but the flesh is weak"

      became

      "The wine is agreeable, but the meat has spoiled"

      ?

      1. Terry 6 Silver badge

        Re: Nothing new here

        For any translation system to work it would need to identify and account for non-literal phrases and words that adopt specific meanings in specific contexts. In other words, a bloody big dataset within the main dataset

  8. gBone

    Google's Korean to English translator dunt need numbers to spout totally weird stuff ^^^

  9. Dr Dan Holdsworth Silver badge

    Time to do things the easier way

    The problem with going from one human language to another is that the tenses, idioms and meanings are all slightly different between human languages. One less well known way to cope with this is to translate every input language into an interlingua called Lojban which is syntactically unambiguous. Naturally, a fairly short idiom in, say, English tends to translate to a fairly long sequence of Lojban, but this is necessary to preserve the syntax.

    From the Lojban interlingua, you then translate into the destination language, making as good an attempt at preserving the meaning and syntax as you can. This sounds a roundabout method, but it breaks the translation difficulty down into two easier halves; translate into Lojban, and translate from Lojban into the target language.

    This sort of thing isn't a new concept. Science uses a hodge-podge of mostly Latin plus some Greek as a universally understood language.

    1. Primus Secundus Tertius Silver badge

      Re: Time to do things the easier way

      @Dr Dan

      Once upon a time, anyone who was anyone in Europe understood Latin. Peasant languages such as English were not rated for serious discourse.

      1. Clarecats

        Re: Time to do things the easier way

        "Once upon a time, anyone who was anyone in Europe understood Latin. Peasant languages such as English were not rated for serious discourse."

        As seen in Shogun.

    2. Jason Bloomberg Silver badge
      Joke

      Re: Time to do things the easier way

      idioms and meanings are all slightly different between human languages

      I'll be checking that momentarily.

      1. David Paul Morgan

        Re: Time to do things the easier way

        perhaps we should table that?

        1. Michael H.F. Wilkinson Silver badge
          Coat

          Re: Time to do things the easier way

          Well, as the saying goes: The other Shaltanac's joopleberry shrub is always a more mauvy shade of pinky-russet

          I'll get me coat. Doff's hat (grey Tilley once more) to the late, great Douglas Adams

    3. Charles 9 Silver badge

      Re: Time to do things the easier way

      Makes me wonder if the source language MEANS to leave something ambiguous, leaving nothing for even an interlingua to pick up. Or it's a term that requires a lot of outside cultural context to understand.

  10. Anonymous Coward
    Anonymous Coward

    AI

    All this prove is that the translation is artificial but not intelligent if it can be fooled so easily.

  11. GIRZiM Bronze badge
    Headmaster

    Wie bitter?

    "Er ist Geigenbauer und Psy6hothearpeiut", and getting the translation: "He is a brick maker and a psychopath."

    Psy6hothearpeiut -> Psychopath, okay, maybe.

    Geigenbauer -> brick maker, no, not unless the original was mistranslated as "He is a brick maker and a psychotherapist."

    Anyway, perhaps they're focussing on the wrong things when training the nets: aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.

    But, if Faecesbook and text messages are anything to go by, it won't matter anyway: pretty soon, we'll all be able to read the Canturbury Tales in the original Chaucerian vernacular with no trouble at all. Or at least no more trouble than we have reading what people write on Fakebook and in their texts anyway.

    Dyslexic (n.) - The lingua franca of the Internet.

    Right. that's enough of this nonsense; Imma read 'Mick's Bath' by Frankincense Barleycorn - l8rz, Pepys.

  12. Giovani Tapini

    Isnt this research

    just about passing a Turing test. It sounds like the same principles in trying to defeat Capcha's etc.

  13. silverfern

    Where's Stanley Unwin when we need him?

  14. Roy Badami

    Interestingly I observed something similar with Google Translate as a learner of German.

    So, by way of background, there is a long-standing convention in German that if you can't represent umlauts for whatever reason, you write the letter 'e' after the vowel to show that it should have had an umlaut on it. I got pretty used to the fact, over the years, that Google search perfectly understands this convention, and that I can perform searches of the German web using this convention with no loss of funcionality (which is convenient since on many systems umlauts are hard to type with a UK or US keyboard layout).

    When I started using Google Translate as a convenient way to look up words or phrases I naturally assumed that it would, similarly, understand the convention. In fact, I discovered that I often got poorer (and sometimes very bizarre) translations if I used the 'e' conventikon rather than typing the actual umlaut. Of course, this makes sense. The 'e' spellings would be very rare in the corpusus that the machine learning algorithms were trained on, so using that spelling gave pretty random results.

    I *think* they may have fixed this now..... Hopefully....

  15. Anonymous Coward
    Paris Hilton

    So spell check them already

    and look for ones that incorporate numbers

  16. Anonymous Coward
    Anonymous Coward

    Machine translation

    "“Machine translation is used to connect people and share information, but when the translation is wrong, the opposite can happen,"

    Well indeed. Fill in the missing word in the well known phrase or saying belowe. No prizes, it's just four fun. It's not even new news, but there may be extra pints for anyone correctly idenfitying the year ink westion. And what do pints make? Fridays.

    “Meanwhile, the poor [blank], by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.”

    Thanks for all the fiche, as the field circus peeps used to say.

    1. GIRZiM Bronze badge

      Re: Machine translation

      Ooh! Ooh! Me! I know!

      "Meanwhile, the poor baby fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation."

    2. DropBear Silver badge
      Trollface

      Re: Machine translation

      "Thanks for all the fiche, as the field circus peeps used to say."

      You can have it, everything is on microfiche these days anyway...

  17. Stevie Silver badge

    Bah!

    I'm flashing on an old Far Side cartoon with a guy wading ashore on a desert island to be greeted by an obviously marooned for a long-time ventriloquist c/w dummy.

    Ventriloquist: Hello stranger, what's your name?

    Dummy: Run! He's crazy!

    Ventriloquist: Ha Ha! Be quiet Gus. Come and sit here, stranger.

    Dummy: Run! He'll eat you! He ate that other guy only last week!

    Ventriloquist: Shut up Gus! Haha!

    Dummy: He's mad I tell you!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019