back to article Google's robot army learns Spanish

If you want to learn another language, you need to spend time in the country, talk to people, get drunk and attempt to order complex drinks, and eventually read that country's great works of literature – unless you're Google, that is. In a recent paper, three Googlers outlined a new approach to machine-based translation that …

COMMENTS

This topic is closed for new posts.
  1. Mephistro
    Happy

    "ROMANES EUNT DOMUS"

    comprender? ==> comprendéis?

    1. Andrew Moore

      Re: "ROMANES EUNT DOMUS"

      si, yo comprendo

    2. Neoc

      Re: "ROMANES EUNT DOMUS"

      Shirley: "Romani domum ite" (as a declarative), based on Latin sentence structure.

  2. Neil Barnes Silver badge

    One problem with this approach

    is the requirement for an extremely large data set from which to derive the original statistical relationships, and the necessary hardware to crunch through it all. While this might work using a distributed system, I can't see it as a standalone quite yet...

    In some ways its similar to the statistical n-gram spelling correction Google researchers proposed a few years back; effective, but requiring a huge database to work.

    Nonetheless, interesting stuff.

    1. Anonymous Coward
      Anonymous Coward

      Re: One problem with this approach

      Good thing they have a huge distributed system, then, I guess?

    2. Voland's right hand Silver badge

      Re: One problem with this approach

      You will be surprised on the size of the dataset. It is much smaller than you would expect.

      Right, where should we start - this method is not original. This is the Schliman Method. Schliman used this method to be a successful translator/trade rep for varuous german traders and manufacturer associations. This allowed him to collect the capital needed to go and play with amateur archeology and discover the lost civilzations of Troy and Mikena.

      The reason why the size of the dataset needed is much smaller than you would expect is that most languages belong to a handful of language groups. Example - if you know one language from each indo-european subgroup you start understanding the whole group (even if you cannot speak all of the languages properly). A neural network can pick up the similarities very easily so nothing surprising here.

  3. jake Silver badge

    The gootards really need shunning/quashing ...

    ... they are trying to become a .gov, in the Nineteen Eighty-Four sense.

    Evil, evil stuff, no matter how you look at it.

    1. Vociferous

      Re: The gootards really need shunning/quashing ...

      Yeah, translating languages - what utter evilness. They should be shot.

      1. Steve Crook

        Re: The gootards really need shunning/quashing ...

        Didn't god punish humanity by making us all speak different languages?

        You don't suppose this project is going to be like the Nine Billion Names Of God when Google make that last entry in their database and everyone can speak to everyone else...

        1. Graham Dawson Silver badge

          Re: The gootards really need shunning/quashing ...

          They have a solution. Google Stars. Somehow it makes sense.

  4. Andrew Moore

    When I'm stuck for a spanish word, I usually take the English word and add an -o or -a (or -ado, -ando or -amente depending on context) to the end of it. Most of the time it works out

    1. Anonymous Dutch Coward
      Pint

      Romanes optimes sunt ;)

      Long live the Norman conquest as well as the infiltration of Latin/Romance terms into English!

      (Actually, knowing some English really helped me learn Latin and French, so thanks for that!)

      (Title: yes, well, had to stay in line with Romanes eunt domus, right?)

      (Icon: no vinum?... well beer it is then)

    2. James Micallef Silver badge
      Happy

      "Romano goando homeamente"

      hmm, no, that doesn't really work does it?

  5. Dan 55 Silver badge
    Trollface

    ... and this is why Google Translate works so well.

    1. Vociferous

      > this is why Google Translate works so well

      It's not that bad. For some languages, e.g. Spanish and German, you get a decent translation. Some other languages, like Chinese, still have quite a bit to go, tho.

      1. Dan 55 Silver badge

        Understandable, yes. Decent, I beg to differ. There's still a fair way to go yet.

  6. Vociferous

    The Culture

    Have you guys read The Culture books by Iain Banks?

    Every time I read about Google's projects, I'm reminded of The Culture.

  7. Anonymous Dutch Coward

    Vectors!?!

    Is it me or does the article just start babbling about vectors without explaining what vectors are meant? Probably in the articles the Googlers published, but who RTFA when you can complain?

    1. Anonymous Coward
      Anonymous Coward

      Re: Vectors!?!

      Usually with this sort of thing it's vectors of features, as detected by, well, various feature detectors that they came up with and found were suitable to describe words.

      Pointless to try to describe all of them, probably, but you're right, they could have given us a couple of examples.

    2. The Indomitable Gall

      Re: Vectors!?!

      Pretty obscure reference, to be fair. I got what they mean by the example "king - man + woman = queen", whereas the silly "graphs" were more confusing than anything.

      A vector, by the definition, is simply a move through n-dimensional space.

      The mind-twister here is that the "dimensionality" of a word is kind of arbitrary, because the component parts of the meaning change from word to word.

      The example used of "king" (or "queen") tells us not only gender, but also the importance of the person, the nature of the constitution of the place.

      The weirdest thing about vectors in a lot of AI applications is that they've mostly abandoned the idea of axes -- notice that the vector has to subtract "man" as well as adding "woman", because the system doesn't recognise the existence of a gender "axis".

      Instead, we have a selection of "features" that are measurable only in terms of presence or absence.

      1. BristolBachelor Gold badge

        @Indomitable Gall

        Ah the Gender axis. I remember a sketch a very long time ago (Smith & Jones era?) where someone asked if they had any kids, said "Yes, three - one of each; boy, girl & hairdresser".

        However your point about axes is spot on.

  8. John Deeb

    All your translations are belong to us

    Then again, to make a really good translation a very highly evolved AI would still be needed first (chick-egg thing?). Proper language is intricately part of the whole consciousness and self-awareness thing or whatever it is (I as language construct). Meaning as context (per Donaldson). Without proper context evaluation language has little meaning beyond boring scientific manuals which are already written in English in many cases.

    It's ambitious as a project and no doubt useful things might be developed but quality translations won't be one of them. Then again, many settle for dumb translations so the solution might also be to create more dumb users with dumb, modest needs and the translation machines will start working better already!

    1. David 164

      Re: All your translations are belong to us

      I don't think it will be that hard for Google to develop a way to analyse a context of the text. It not that hard to tell a poetry from legal document from a love letter and then apply the correct translation or the translation most likely to be use to deal with that document. I did read once that Google were asking users to describe document they want translating, whether it was legal,poetry, formal, informal but I never seen this option on Google translate.

      Handling real time voice to voice translation may be trickier to do.

    2. Anonymous Coward
      Anonymous Coward

      Re: All your translations are belong to us

      "It's ambitious as a project and no doubt useful things might be developed but quality translations won't be one of them. Then again, many settle for dumb translations so the solution might also be to create more dumb users with dumb, modest needs and the translation machines will start working better already!"

      More quality is better than less. Before Google Translate, machine translations were often literally unusable. As someone who has to communicate with speakers of foreign languages frequently (via e-mail) I don't know how I would get along without Google Translate. You apparently don't have the same business need.

    3. Vociferous

      Re: All your translations are belong to us

      Computers can be really good at determining context. Remember that IBM's 'Watson' won Jeopardy, which is pretty much all about context.

  9. Christoph

    I wonder if this would help interpret written material in extinct languages where we have a few known words? Though there might not be a big enough data set.

    1. Anonymous Coward
      Anonymous Coward

      I can imagine that they could run extinct languages against many different modern languages to get some useful information about etymology that humans wouldn't otherwise be able to work out because nobody speaks that many languages.

    2. The Indomitable Gall

      No use...

      "I wonder if this would help interpret written material in extinct languages where we have a few known words? Though there might not be a big enough data set."

      This stuff, along with existing Google Translate technology, relies on a massive monolingual dataset as well as a smaller bilingual one. We don't have enough data.

  10. Forget It

    This seems to be putting Semantic back into the Statistical

    of Statistical Machine Translation

  11. Eponymous Cowherd

    All I know is.....

    my hovercraft is full of eels.

    1. Anonymous Coward
      Anonymous Coward

      Re: All I know is.....

      Hearing that, my nipples explode with delight!

  12. Anonymous Coward
    Anonymous Coward

    Can it handle a poor starting material

    If you start with an urban polyglot description of last nights Coronation Street (or an interchange between football pundits on Match of the Day) rather than an encyclopedia article on the history of the steam engine does it produce reasonable copy?

    ie at what point do idiom and dialect defeat google's best efforts to reduce us down to the lowest common denominator advertising receptacle?

    1. Mephistro

      Re: Can it handle a poor starting material

      "...at what point do idiom and dialect defeat Google's best efforts...?"

      Idioms and dialects can defeat ANY translator, not only Google Translate. Even native speakers have trouble with that. As for the 'advertising receptacle' part, I partly agree, but just translating ads is not enough, as most ads need to be tailored for their audience, and that includes taking in account lots of cultural differences, not only language.

      1. Blofeld's Cat
        Facepalm

        Re: Can it handle a poor starting material

        "Idioms and dialects can defeat ANY translator..."

        I was once at a conference in Germany where a particularly long-winded was speaker expressing himself.

        The simultaneous English translation went silent about half way through, and just as I thought it had broken, the translator came back on sounding somewhat exasperated.

        "For god's sake man - get to the verb!"

        1. Ian 55

          Re: Can it handle a poor starting material

          "The Belgian minister has just made a joke - it would be polite to laugh."

    2. Anonymous Coward
      Anonymous Coward

      Re: Can it handle a poor starting material

      "ie at what point do idiom and dialect defeat google's best efforts to reduce us down to the lowest common denominator advertising receptacle?"

      Be as cynical as you want, I appreciate that Google Translate lets me communicate (not perfectly, but usually effectively) with speakers of various languages that I don't speak.

    3. The Indomitable Gall

      Re: Can it handle a poor starting material

      Anything that works for vocabulary can work for idioms, which are often as not thing more than multiword "words".

  13. Anonymous John

    "Exploiting Similarities among Languages for Machine Translation"

    What about the lack of similarities? Take Scottish Gaelic.

    No indefinite article, no words for "yes" and "no" as such, no present tenses apart from the two verbs for "to be", no verb "to have", a separate set of numbers from two to ten used only for counting people, etc.

    1. Anonymous Coward
      Anonymous Coward

      Re: "Exploiting Similarities among Languages for Machine Translation"

      And I guess you think Spanish grammar is the same as English? Heh... gender inflected articles and nouns, verb conjugation beyond adding an -s for third person singular, three different kinds of past tense that are all used in different circumstances than our two, a separate set of conjugations for subjunctive and conditional cases, etc.

      No languages are going to be translatable just by looking up words verbatim in a dictionary. But it doesn't hurt to be able to do so.

    2. The Indomitable Gall

      Re: "Exploiting Similarities among Languages for Machine Translation"

      "No indefinite article, no words for "yes" and "no" as such, no present tenses apart from the two verbs for "to be", no verb "to have","

      The technique is for guessing at translations of unknown vocabulary. Normally when natural language processing guys talk about vocabulary, they're talking about words with an independent and relatively unambiguous meaning -- so-called "lexical words", eg "cat", "hamburger", "galactic". The other class of words is called "function words", and these are the grammatical glue that has next-to-no meaning outside of its context -- eg "me", "now", "would" etc. Within natural language processing, these are often not even considered "words" because they follow directly from grammatical rules, and there is very little choice when using them.

      These "function words" also form a closed set -- consider the number of pronouns in any given language with the number of common nouns. It is therefore efficient to deal with these more explicitly than lexical words, and even if you're doing pure statistical translation, all of the function words in a language are likely to turn up in your training data (and if not, you've not got enough data) -- and therefore these things are therefore not going to be "unknown vocabulary", so not applicable to this technique anyhow.

      To use an example of how vectors would work to translate between very different structures, consider disease.

      Say the software knows how to translate "I am hungry" to Gaelic, but doesn't know how to translate the word "thirsty" from English to Gaelic.

       I am hungry -- tha an t-acras oirm (lit. is the hunger on_me)

      However, the system does know that the only difference between "hungry" and "thirsty" is that "hungry" is about food and "thirsty" is about drink, so the software can generate a vector (-food, +drink) that given "hungry" as its input/starting point will give "thirsty" as its output/endpoint.

      Now that same vector will of course also go from "hunger" to "thirst", so it doesn't matter that the Gaelic equivalent of the phrase uses a noun instead of an adjective.

      Very clever stuff.

  14. Suburban Inmate

    teh lulz

    At least I'll still have FaceBing's hilarious "translations" (in the way McDonalds is "food") for a good giggle.

    1. Anonymous Coward
      Anonymous Coward

      Re: teh lulz

      +1

      Sigh, why did Microsoft have to invest in Facebook.

      I'm tired of copying and pasting stuff from Facebook into Google Translate in order to understand it.

  15. mIRCat
    Alien

    The only obvious use.

    But will it be effective when our future alien overlords are descending from orbit?

  16. CCCP

    Failure seems to be black/white + SEO

    We tried using google translate in a commercial context for user reviews - a lot of them in fact. The problem was using human translation was too expensive.

    At the time google was about 85% "good". The problem was the 15% "terrible". You can't put content in front of users that has a 1/7 chance of making them laugh out loud for the wrong reasons.

    Second point is speculative - will google apply a discount or penalty for sites that use google translate to pretend they have original content in more languages, like user reviews?

  17. joanbee

    Googlespeak?

    So you have a list of words or phrases that mean the same thing. That list represents a concept. Somewhere out in database land that list has an identifier. What happens if you start using the list identifier to communicate? Maybe Google should work up a human-usable symbology for these. Lingua Google...

  18. Martin Budden Silver badge

    Yes, but

    can it translate YouTube comments into English?

  19. AceRimmer1980
    Alien

    Worf, at Khitomer

    sharpening his bat'leth..

This topic is closed for new posts.

Other stories you might like