Smoe mistkaes are eaiser to corerct tahn otehrs. For a human, anyway.
Neural-network-based language translators can be tricked into deleting words from sentences or dramatically changing the meaning of a phrase, by strategically inserting typos and numbers. Just like twiddling pixels in a photo, or placing a specially crafted sticker near an object, can make image-recognition systems mistake …
In nature it is rare to have perfect information - an animal that can correctly distinguish a predator from the background has a big survival advantage. Therefore handling incomplete or corrupted data (eg a cougar partially hidden behind a rock) became a necessity. This is why human brains can do general pattern recognition far better than current AI systems.
but still, you’d figure lions’ pattern matching would have evolved to
lunch = regex.match(stripes) by now.
I wonder how you cook up an AI translator that doesn’t bat an eye at a word like ‘Psy6hothearpeiut’. maybe that’s the price you have to parse online dating profiles like ‘I hope your a happy man. I am a clever women’.
I'll be impressed when a computer can match the capabilities of the "average" person (i.e. do not need to be a world beating chess player, artist etc) - mental and physical (taking up same volume as a human with similar robot body) and using similar amount of energy as a person and learning in same way a child does (i.e. "self learning" - working it out for themselves in conjunction with environmental interactions some teaching process from others).
A "jack of all trades", able to respond to changing situations computer controlled robot is the true comparison, a software / hardware combo designed for a specific task (and using lots of power / hardware) is not.
Google doesn't even need spelling mistakes, or it didn't used to.
English -> German is very dodgy with Google.
(NOTE: The following example now works, because I uploaded the correct translations a couple of years back)
I had to do a quick translation of a handbook I'd written in English into German. I thought I could save a little time and use Google Translate to get the rough text translated and just tidy it up...
The problem is, Google Translate has real problems with formal English. Abbreviated English is fine, but formal caused it to ignore the negatives:
"Do not open the case, high voltage inside" -> "Das Gehäuse öffnen, Starkstrom drinnen"
"Don't open the case, high voltage inside" -> "Das Gehäuse nicht öffnen, Startstrom drinnen"
Or even funnier:
"Do not open the case, no user serviceable parts inside" -> Das Gehäuse öffnen, nicht drinnen"
(Open the case, nothing inside)
There was nothing with the spelling mistakes, just it would ignore certain words, like "not", although why "no user serviceable parts inside" translates to "nothing inside" is anyone's guess.
If Google's translation AI had a more linguistically informed parser instead of treating them as strings of letters between whitespace, it wouldn't make such fundamental errors.
The problem will come in about a decade, when they *start* solving that: correct dialectal and idiomatic usage will be simplified to the lowest (American) common (Webster) denominator (slang). By the time they fix it to be more advanced, the damage will be done to the language by pushing our little monkeys to use their cloth mummies instead of going and asking a real teacher.
"English -> German is very dodgy with Google.
It also makes serious mistakes in translations to English from common European languages. It not only omits negatives sometimes - but drops significant words for no apparent reason.
It often fails to translate fairly obvious German compound words - yet will happily translate the components if they are separated out by a space.
It appears to have no idea of context. So a church "mass" (German 'messe') usually comes out as a "fair".
This is standard Google machismo: they will never admit to not knowing. If Google Maps doesn't recognise an address it will take you somewhere else with a vaguely similar name. If Google Translate isn't sure of how a word fits into the sentence it will silently leave it out of the translation, which is much worse as it isn't always obvious.
Written English has a huge amount of redundancy -
Sm mstks r sr t crct thn thrs. Fr hmns nywy.
Adding in anything extra to help with context or clarification makes it progressively easier to understand. And speaking what's written with a Scottish accent often helps in my experience.
Also it's more proof the term "AI" is marketing. Grammar checkers and Spelling checkers don't seem any better than 30 years ago.
Machine translation is rubbish to translate a book or even a half page of technical info. It's really bad to translate your own language to one for a different native speaker.
AI can't even correct OCR or proof read new writing. You need an experienced human proof reader.
Also, I always hear people praising Google Translate, MS Translate and various other tools and services, but they are all doing English <-> Spanish or English <-> French, with a bit of Chinese thrown in for luck.
All of them make a horrible pig's ear of English <-> German.
Yet the roots of the English language are Germanic. I wonder how Google manages English <=> Dutch, since Dutch is the closest related language to English. My own impression of Dutch is that it is one third English, one third German, and one third that I cannot make out.
It does seem from other comments that formal and informal versions of language need to be treated separately,
"My own impression of Dutch is that it is one third English, one third German, and one third that I cannot make out."
All three are common descendants of Proto-Germanic. English is built on the same structure as Dutch, but has a ton of Scandinavian influence in spelling, and a lorry load of Romance vocabulary and idioms added. Dutch has Gothic and Spanish to thank for its whack pronunciation. German grammar was reformed multiple times so it would be taken seriously like Latin.
In my experience - I do quite a bit of translating - Google translate is OK for English to Italian and vice versa but the results do need reviewing and are rarely, if ever, "production" ready. You really need to know both languages in depth to be able to produce translations that are accurate and well written. The AI is getting better but it's far from perfect. Maybe it will be in 10 or so years.
Agreed. I worked for a short time in a translation buro. My translations were readable, made sense, grammatically okay, but were a long way from what the trained translators were producing.
And what I was producing was a thousand times better than what Google Translate was dishing up. As you say, it is a long way from being production ready. In most cases, at least with German, you could take about 15% of what it produced as usable text, the rest would need to be re-written from scratch.
It seems the fundamental flaw appears to be they are trying to map the character stream in one language to another by the neural net, rather than decoding the words first using some dictionary and then learning how to map sentence to sentence?
Yes, OK, that is the easy way to set up a learning system so of course everyone would do that for speed...
"Just watch the subtitles on a popular news program to see daily demonstrations of this failure mode."
The BBC iPlayer "Super-squirrels" documentary this week had several typos in the subtitles. It probably wasn't a pure machine translation - as some words had a similar meaning but were totally different in sound and spelling.
Other typos were not even credible as misheard words - they made no sense in the context of the subtitle.
Just watch the subtitles on a popular news program to see daily demonstrations of this failure mode.
I remember one instance on "Question Time" where, in the subject of legalisation of cannabis, the subtitling machine decided to produce "can of piss". I've got a photograph of it somewhere.
I've always thoufght it a bit odd that the misprint is invariably a word or phrase less common than the corect one.
The translation (even the "correct" one) is still wrong...
Learn (ok, machine-learn) the conjugation of the verbs, just because English does not have it (except for the 3rd person singular) it doesn't mean that other languages miss it as well! (the romanic languages have it at least)
"sie hat sich zu einer Feministin entwickelt" should be "she turned into a feminist". The translation is grammatically wrong, claiming that "they turned into a feminist" (singular vs. plural), then it should have read "sie _haben_ sich ... entwickelt".
And what is a "safe feminist"? Does not compute for me, and I am reasonably fluent in both English and German.
@Ian Emery: I found that out years and years ago; multiple online translation programs converted English "bread and water" in to Russian "bad food".
Actually, that is likely to be an artefact of an idiom rather than an adversarial typo: for a Russian "to sit/live on bread and water" means to be constantly hungry, either due to extreme poverty or due to some illness or another reason that does not permit one to eat normal food. The translation apparently preferred the idiom (and bungled it a bit) to the literal meaning.
"Actually, that is likely to be an artefact of an idiom rather than an adversarial typo: for a Russian "to sit/live on bread and water" means to be constantly hungry, either due to extreme poverty or due to some illness or another reason that does not permit one to eat normal food. "
Also in English. 'Bread and water' tends to rarely mean actual bread and actual water.
From a Romanian friend, here's a catastrophically bad Google translation: https://translate.google.com/translate?sl=ro&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=https%3A%2F%2Fwww.gustos.ro%2Fretete-culinare%2Fchec-cu-nuci-si-rahat.html&edit-text=
"Rahat" is the Romanian word for Turkish delight. It is also an euphemism for excrement. Google chooses the idiom instead of the main meaning, with hilarious results.
On the same page, Google's advice to " do the dick test to check if it's baking" should instead suggest to "do the toothpick test"..
The problem with going from one human language to another is that the tenses, idioms and meanings are all slightly different between human languages. One less well known way to cope with this is to translate every input language into an interlingua called Lojban which is syntactically unambiguous. Naturally, a fairly short idiom in, say, English tends to translate to a fairly long sequence of Lojban, but this is necessary to preserve the syntax.
From the Lojban interlingua, you then translate into the destination language, making as good an attempt at preserving the meaning and syntax as you can. This sounds a roundabout method, but it breaks the translation difficulty down into two easier halves; translate into Lojban, and translate from Lojban into the target language.
This sort of thing isn't a new concept. Science uses a hodge-podge of mostly Latin plus some Greek as a universally understood language.
Biting the hand that feeds IT © 1998–2019