Well done for an excellent, in depth article
After just a couple years of practice, Google can claim to produce the best computer-generated language translations in the world - in languages their boffin creators don't even understand. Last summer, Google took top honors at a bake-off competition sponsored by the American agency NIST between machine-translation engines, …
Well done for an excellent, in depth article
garbage in garbage out?
if you gave nonsense phrases to a human translator shouldn't the translation be equally nonsensical?
i suppose you expect a "signalling NaN" feature or exception raised... but just like with computer arithmetic back in the Fortran days... overflows and divide by zeroes used to be up to the user/programmer to detect. the additional "feature" will come easily enough over time, but there'll probably be some instances where people really can use the unchecked result... if it buys them some efficiency... and is non-life threatening when in error.
I'm reminded of Scott Adams' comments on the Turing test - there's no point in coming up with fancy algorithms to get a computer to pass the Turing test: just make it ignore any questions it's asked, and whinge about its job.
Language - especially conversational language - simply doesn't follow rules, regardless of what my Latin teacher always told me...
'the automatic-translation engines they constructed triumphed by sheer brute-force statistical extrapolation rather than "understanding".'
This assumes that human "understanding" is anything more than brute-force statistical extrapolation. I'll grant that what we do naturally may be an order of magnitude more complex that what we're currently able to program computers to do, but we don't know enough about how our own brains work to say that we don't work on similar principles.
In fact, Google's translation success could be seen as supporting this possibility...but then again, what does that say about my understanding of this stuff?
I recently evaluated several free translation engines available on the market on a simple paragraph of text using the same Spanish text as the source and English as the target language (you can always do that even if you speak only one language fluently like most people in English-speaking countried do).
Google produced 8 style errors and 2 grammar errors on the paragraph while another engine (Paralink) performed only 1 style error and 0 grammar errors, a massive difference. It shows that the competition where computer programs score other computer programs (with a score like BLEU) is pretty much meaningless.
The problem is obviously, as you rightfully point out, with the purely statistical approach which should be combined with case-based (exception-based) reasoning. Also, the Google team would greatly benefit from reading Pinker's book 'Words and Rules' which takes you through the history of different methodologies in linguistics from Chomsky to (statistical) neural networks and shows how the two can be combined.
Disclaimer: I do not work for Paralink nor do I own their stock; just wanted to set the record straight that Google is not the only game in town, nor is it even best or market leader. Now the problem is that it is hard to find Paralink using Google search, but that's a whole different story :). But thankfully we still have other search engines...
"This assumes that human "understanding" is anything more than brute-force statistical extrapolation. I'll grant that what we do naturally may be an order of magnitude more complex that what we're currently able to program computers to do, but we don't know enough about how our own brains work to say that we don't work on similar principles."
Wrong I'm afraid - we know far more about the way the brain understands language that most people think! Try reading "The Language Instinct" by Steven Pinker and you will see that it has been proved that the human brain has a set of clever rules that are used for building and understanding language. Sure, there is stuff that we don't know, but its amazing how much is known!
...the above sentence being Chomsky's famous demolition of Skinnerian theories of language aquisition - Google being Skinner in this instance.
Chomsky demonstrated that humans understand grammer and not positional statistical correlation between words by writing a scentence consisting of pairs of words all of which had a zero correlation in the extant engish corpus:
colourless - green
green - ideas
ideas - sleep
sleep - furiously
Native English speakers comprehend this as syntaxtically correct - albeit semantically nonsensical.
You cannot write a Skinnerian translator!
Another interesting feature of the Google 'translator' is that take the title of this article, translate it from English to another language and then back and you'll end up in a completely different place in English. Or try it with slightly longer text and it's just gets worse - this would be the first thing I'd fix.
How Google translates without understanding
Comment Google traduit sans arrangement
Back to English:
How Google translated without arrangement
There are plenty of words for which there is not a 1:1 correlation between two languages, and even concepts for which the words might be shared in one language but not in antoehr. For example, I have heard that in Russian, a common word for "proper" in teh sense of correct behavior is the same as a word for "convienient". thus if i were to ask someone to bribe anotehr for me, and they were to respond (in Russian) that to do so would be improper, a statistical amchine translator would probably translate the response as teh person saying that it would be inconvienint for them to do it. Since it requires a reasonable amount of common sense, it would be hard for a computer to understand the context of the discussion and so figure out which word is correct, especially if the context has to be inferred from a previous paragraph, or a seperate piece of text altogetehr.
I bet that if you translated something using any machine translator into written hebrew, and than back into any western language of your choice, the tenses would be completely mangled, since Hebrew relies on the reader figuring out the tense and putting in the right inflection. (consider the story of Jesus preaching in the Synagouge in Nazareth).
A better means of translation would be to write sentences in some form of internationalised programming language, which unambiguosly describes the relation between concepts, and the tenses involved. Then it would be a far simpler task to compile the text into readable text. Of course, you would still want to have original language text for anything where exact replication fo the text is necesarry, such as poetry, legal document and so forth.
Your discussion of BLEU scores is incorrect. You wrote:
> But all BLEU really measures is word-by-word similarity: are the
> same words present in both documents, somewhere?
The documents being compared (the machine translation, with one or more reference translations produced by humans) are first "segmented". Generally, this means broken up into units smaller than an entire document, typically sentences, and those units aligned (not necessarily one-to-one, since one translation may use a single sentence where another translation breaks the thought into two or more sentences; and of course a bad translation may omit a sentence entirely).
> "Wander" doesn't get partial credit for "stroll," nor "sofa" for "couch."
One way to get around this problem is to use multiple reference translations, on the assumption that different human translators may choose different synonyms.
> The complementary problem is that BLEU can give a high
> similarity score to nonsensical language which contains
> the right phrases in the wrong order...
> Now here is a possible garbled output which would get the
> very same score:
> "was being led to the calm as he was would take carry
> him seemed quite when taken"
Your statement is correct, your example is not, if by "phrases" you mean sequences of > 1 word. The reason is that your example shares (as far as I can see) no sequence of two or more words with the better sentence (which I omitted here). That is, the BLEU score compares not only 1-grams (that is, word correspondences), but also N-grams where N > 1. So your example would be penalized for not having any N-grams in common with the better translation where N > 1.
There is a good discussion of BLEU scores on Wikipedia.
As the article says, it's possible to do useful machine translation without using grammars or parsing, much to the distress of linguists.
It's also possible to usefully support open vocabulary, executable English, without grammars or dictionaries, again to the distress of linguists.
This is demonstrated in the online Internet Business Logic system.
It works as a kind of Wiki for executable English content. Shared us is free.
It's at www.reengineeringllc.com , with examples that one can view, run and change, using a browser. You can also write and run your own examples, again using a browser.
Not recommended for linguists with high blood pressure (:-)
>There are plenty of words for which there is not a 1:1 correlation between two languages
Look up "fertility statistical machine translation" in Google.
> One way to get around this problem is to use multiple reference translations, on the assumption that different human translators may choose different synonyms.
Another way to get around it would be to use METEOR (which has a WordNet synonymy plugin), and which correlates with human judgement significantly better than BLEU both at the corpus and sentence level.
Of course once a "standard" has been set it is difficult to go back and change it.
> Another interesting feature of the Google 'translator' is that take the title of this article, translate it from English to another language and then back and you'll end up in a completely different place in English.
Round-trip translations are a notoriously terrible way of evaluating machine translation output. Harold Somers (2005) discusses this in "Round-Trip Translation: What Is It Good For?" -- he uses BLEU scores (strangely), but the introductory criticism is good.
And seeing as everyone seems to be plugging their stuff, I might as well plug Apertium (http://apertium.sourceforge.net), a free software / open source machine translation engine/toolbox for closely (and not so closely) related languages. Unlike Google, not cagey with how it works (or in some cases doesn't) ;)
My wife recently started a Spanish user group to assist daycare providers that don't speak English as a first language.
She needed to have all the documentation that goes with daycare licensing translated, as well as creating a newsletter in Spanish and English.
Although she does speak Spanish fairly fluently she decided to save a whole bunch of typing and simply use a translator for most of the electronic documents.
She also decided this would save time on the newsletter, as she could write it out in English, then have the translator provide the Spanish copy.
All I can say is Microsoft translates English to Spanish and vice versa slightly more efficiently, and a heck of a lot more grammatically correct than Google's amusing attempts in this field.
So perhaps Google should stick to penguin genocide and leave the language translation to others. (No Microsoft's effort wasn't anything near perfect either - but it did require a lot less correction).
I just tried Paralink, and found it pathetic. I use Google's translator in the way described at the end of the article, to provide a rough framework or gist, and then make sense of the output accordingly. I often look at a sentence or phrase from the Google output and compare it with the Spanish original to see how to refine the Google version. Paralink's output, though, was dire. Not only does the web translator apparently only do much smaller blocks of text than Google, its vocabulary seems to be much smaller, with "prima" being translated as "premium" instead of "cousin", and some verbal phrases with "el (+ verb form)" being left untranslated entirely. If one has any grasp of the target language, I think Google's service is great, often providing a framework within in which it is possible to finish the translation job by yourself. From what I've seen, Paralink isn't there yet. YMMV, of course.
Every once in a while, I get a product that was made in the orient (I don't discriminate here) that has its "English" instructions ALL mangled up. While the words could be correct, the order is (to say the least) wacho. It sounds like Yoda is writing the words and then making them worse (just for effect!). I suspect that the original (oriental) language had the words (symbols) in a similar order, but they need some "glossing" to be anywhere close to "native". One of these days the translator people will get more and more paired language documents to train their translator, but until then, I'm hoping that a good person who is fluent in both languages does the translation. It will turn out better that way.
An alternative is that we all learn Esperanto (it was a good idea at the time!). As one said every language needs an army behind it (ebonics/jive notwithstanding).
The problem here isn't that computers like rules and language isn't rule based.
Firstly, language is VERY rule based, and is built on tons of rules. It's all rules. That's why we can understand each other. Nothing pisses me off more than people characterizing language as rule-free gibberish. If it was, it wouldn't work.
The problem is that a lot of communication references and is about things that the computer has absolutely no context of.
The interpretation of time as a thing in motion and the arrow comparison may be entirely culture dependant. Culture is full of things like that. The computer would have NO trouble translating that literally, but is a native speaker of the target language gonna read that and think "what the fuck?" Things like that get screwed up by humans just as frequently.
Secondly, with that in mind, rating a computer translator's performance on the metric of "translating things without any context of what the data is about" is always going to make something look bad. It misses the point of these devices by about a mile.
It's merely a convenience tool.
I think the real special feature of Google's idea is that if it works decently within the context of what machine translators can do, they can simply feed it mountains of data to teach it a new language or teach the latest slang and changes instead of having an army of professors making lexical dictionaries and programming all these rules manually.
Think of it as a cost effectiveness issue. Being able to have a code monkey feed it Polish and Icelandic texts to get a Polish-Icelandic translator means they can develop a translation system that covers far more languages with much less cost.
You could even make it translate between English dialects just for fun. It's all the same to it.
When I was ten, I moved to a different culture and eventually became fluent in both languages. After a while I noticed that when I was asked to translate from one language to the other ... I didn't! I simply said whatever it was in the other language.
With computers, it will always just be a hack. Until a computer can understand thoughts and feelings, it won't be able to express them in any language.
As others have said, there is so much culture involved. Sometimes I think that translation is a myth and the best you can do is to find common experiences.
Well done! Sensible articles on translation are exceedingly rare, and even more so in technical publications.
A friend of mine (another translator) thinks (seriously) this is because humans are hard-wired not to understand how language really works. Because language is the main mechanism of social interaction, an understanding of how it works would undermine its efficacy for fulfilling its intended purpose. And it's true: nothing seems to be as misunderstood as language. The few of us who do understand (your author evidently included) must consider ourselves mutants.
If anyone's interested in reading more, they might like these two articles of mine:
These are examples of the ambiguity of language. The first is a case of the same string of letters representing different words (which may or may not have the same pronunciation). I remember a science fiction story I read years ago where a blue print of a device was written in Russian. Due to compartmentalization restrictions, the Russian wording was copied into a word list which was then translated by someone from Russian to English for passing on to someone who then could analyze parts of the translated blue print (no-one saw the complete Blue Print but just sections of it). Due to the words being translated with no context there was problems such as translation ending up with the string "lead" as both the metal as the technical term for a wire (ie: Lead lead as in a wire made of Lead). The Flies example uses the String "Flies" both as a verb to connote movement and to designate an insect class (modified by the designation of Fruit). Words can not be translated/interpreted in isolation but need to be viewed in context so that the proper meaning is assigned to them for purposes of the translation. Science fiction writer Piers Anthony makes use of this type of word play in his Xanth Series.
As to white house and casa blanca there was an incident during WWII where there was a secret meeting of all the allied leaders in Casa Blanca (where they could have been attacked and killed by the Germans if the meeting plans became known). As it happened. a spy reported the plans to the Germans but due to encoding, decoding, and translation into German the reference to the meeting being help Casa Blanca ended up as getting reported to the Spy's controller as being held in [the] White House (ie: Washington DC/USA),
Rules, yes, but self-adapting rules, and not rules in the form of what most people would consider as "grammar". Language operates at a much deeper level, as you can see from the fact that good translations hardly ever reproduce the most apparent grammatical structures of the original text.
On the UN producing "expert" translation, I wouldn't count on it. Most UN and EU translations better machine translation in degree only, but not in essence. They are by and large atrociously overliteral, and have little in common with natural language.
If language is algorithmic at all (and I don't think it is), it can only be so at a degree of complexity that defies reverse engineering along the lines of an electronic translator. Nobody has ever come close to writing a full grammar of any language, and I suspect the very nature of language (total open-ended versatility) is such that no such grammar can exist. This is because meaning is not encapsulated in the words of the speaker but revealed solely in the response of the listener. Words only mean what people take them to mean.
That is the first insurmountable problem for electronic translation. The second is that meaning is distributed across huge expanses of discourse. In the case of spoken language, it is distributed beyond phonetics into prosody, then beyond prosody into gesture. Written language uses a whole panoply of devices to simulate the effects of prosody and even gesture, and I don't see how an algorithmic approach could possibly allow for this.
Google's approach is a good one. Translation is very similar to code breaking, so use similar algorithms.
However, when you already know things about the languages, you can incorporate this knowledge. For example give it a dictionary and thesaurus, teach it a little about grammar, in each language. Then it can put things in (some sort of) context.
But lets look at it this way. Assuming there is life outside of this planet, and we someday meet them, how do we communicate? Would this approach not be way to get the very first insights into the way they communicate. Sure it wouldnt be perfect, but it would help.
It will never be perfect. I do beleive that language is based on hard and fast rules, but humans dont like rules. It's like my music composition teacher said, "You've got to know the rules, THEN you can break them". We continualy go against the rules with language, make up new words, say things wrong. Computers wont keep up with that, but Googles translator can still do its job: Giving you a rough guide of what is said.