back to article Amazon: IVONA bevy of 'all natural' blabber babes to beat Siri

Amazon bought text-to-speech company IVONA systems on Wednesday, the online book-floggers have announced. The acquisition fuelled rumours that Amazon, the quietest member of The Gang of Four, is planning a rival to Apple's talking assistant Siri. Amazon announced the deal with Ivona Software yesterday on its website, but didn' …

COMMENTS

This topic is closed for new posts.
  1. Dave 126 Silver badge

    This is a fun article, about how Siri grew out of a DARPA project, was intended to do much more than she does now, and after Apple bought her (snatching her away from Verizon's Android handsets) they curbed her abilities and potty mouth. She seems to have plenty of cousins, though.

    http://www.huffingtonpost.com/2013/01/22/siri-do-engine-apple-iphone_n_2499165.html

  2. Piro

    I must admit, their text to speech is pretty damn good, and I've tried it out on my Android handset also.

    1. danR2
      Meh

      Ivona the best I've heard.

      Over a short span, one of the U.S. female voices is quite indistinguishable from real (although one of the male voices does not quite convince) in that if I did not know beforehand, I'd assume it was real.

      However, natural reading of various TYPES of texts (news, non-fiction, novels) requires adjustment. The voice for a Jane Eyre or Mickey Spillane reading would really have to change from that for a science book. Even a dull Librivox speaker will put some life into the characters. You just can't do that with TTS, probably never will. One has to understand each person in the story, and the inflection and modulation also changes from sentence to sentence. Affect is something inferred from the text, it cannot be calculated by an algorithm.

      1. Michael Wojcik Silver badge

        Re: Ivona the best I've heard.

        Even a dull Librivox speaker will put some life into the characters. You just can't do that with TTS, probably never will.

        "Never will" is rather too strong. While we're still very far away from a decent understanding of affect, much less a formal model of it that would let us implement it algorithmically, a lot could be done in this area with predictive models. NLP researchers have made significant progress in recent years in systems that can model rudiments of narrative; going from there to building decent predictive models of the purported emotional states of characters is not a great leap. They wouldn't be nuanced, and no one would claim this represents "understanding" emotion, but if the model predicts a valid emotional overtone for the subject text most of the time, that's good enough for TTS purposes.

        Then it's a matter of varying the TTS prosodic parameters to convey that emotional overtone. That's tricky in itself, because prosody is not terribly well understood itself; for example, linguists who study prosody still can't agree on what in English pronunciation actually conveys "stress". But again you can build a pretty effective predictive model without fully understanding the domain being modeled.

        Affect is something inferred from the text, it cannot be calculated by an algorithm.

        That's a fallacious argument, unless you can prove that human readers don't "calculate" affect "by an algorithm".

  3. Destroy All Monsters Silver badge
    Paris Hilton

    The real question is...

    Does she lick the microphone?

    1. TheOtherHobbes

      Re: The real question is...

      With what? A motherboard?

      1. Anonymous Coward
        Anonymous Coward

        Re: The real question is...

        In this context it is spelled motherbeard.

  4. Tiny Iota
    WTF?

    eh?

    "a 12-year-old startup" ?

    1. danR2

      Re: eh?

      They must have fixed it.

  5. Anonymous Coward
    Anonymous Coward

    Christ on a bike..

    "American English, Ivy" is the stuff of nightmares, beware..

  6. Anonymous Coward 101

    Go on, who else went to the website and got the voices to swear?

  7. Anonymous Coward
    Anonymous Coward

    And German sounds so natural...

    But it could be due to the fact that they are all androids from dark side of the moon.

  8. Jamie Kitson

    IVONA Pun!

    I can't believe you didn't do any IVONA puns :(

    1. diodesign (Written by Reg staff) Silver badge

      Re: IVONA Pun!

      'Ivona bevy of babes' not good enough?

      Don't you dare ruin my Friday!

      C.

      1. danR2

        Re: IVONA Pun!

        I thought it was 'bevy of blabber babes'. That makes my day.

  9. Rob Carriere

    I tried a couple of voices on: "I can say fairly complex, as opposed to complicated, sentences without stumbling, or indeed any other kind of awkwardness." and I'm impressed with how natural that comes out.

    Now the Dutch voices on "Ik kan nogal complexe, maar niet gecompliceerde, zinnen zeggen zonder te struikelen, of wat voor foutjes dan ook." (not quite a translation, but the same structure) are clearly robots as soon a you hit the first inter-word pause. Also they make pronunciation errors (wrong vowel in "te"). But, again, the overall inflexion of the sentence sounds right.

    These guys are onto something.

    1. Frederic Bloggs

      I did a load of work on this in 1994 and, clearly, they have some better inflection and frequency models than we did then. Also the pacing for English (and probably Polish) is much better than then. However, it's clear that the phoneme splitting and reconstruction is not always being done correctly. Which probably reflects on the language skills of the people doing this tedious and exacting work. The corpus of sentences being split may also vary quite a bit in size for each language. That will make quite a difference when doing contextual reconstruction.

      1. danR2

        The real problem is not realistic phono-syntactic synthesis, but the pragmatics of style. It will practically take full-blown AI, or tagged text, to properly render the ever-changing affect implied by characters' content in a novel.

        Amazon books are heavily dependent on fiction readers. After several chapters, only the sight-impaired are going to stick with the voices. People are just going to continue sight-reading.

    2. danR2
      Facepalm

      I checked their website, and unless I missed something, they've made a strange oversight: the languages are all European. Amazon will miss a huge market in India, Japan, Korea, and of course, China.

      1. Michael Wojcik Silver badge

        I checked their website, and unless I missed something, they've made a strange oversight: the languages are all European.

        Call me crazy, but I suspect this "strange oversight" is what we sometimes call "developing what you have the expertise and resources to develop".

        Amazon will miss a huge market in India, Japan, Korea, and of course, China.

        Perhaps - one can only hope - Amazon will be able to supply additional resources and expertise?

  10. Eenymeeny
    Headmaster

    "It even has two Welsh-speaking voices - available in either gender."

    Either sex, you mean.

    1. Dave 126 Silver badge

      There are more Welsh speaker in the US than there are in Wales, plus those living in Patagonia... so what accent does it have?

  11. Arbee

    It was very impressive, but still not a patch on Rutger Hauer...

    "I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched c-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost in time, like tears in rain. Time to die."

    1. Arbee
      Thumb Down

      Also, she says Tannhäuser like a Canadian: Tannhoooooooser.

      1. danR2

        Standard Canadian would be Tann-how-ser, with slight lip-rounding on the ending diphthong. Of course extreme East-coast Canadian might sound a bit like 'hooo', or Oirish.

  12. Destroy All Monsters Silver badge
    Angel

    Hello and again welcome to the Aperture Science Computer-Aided Enrichment Centre. We hope your brief detention in the relaxation vault has been a pleasant one. Your specimen has been processed and we are now ready to begin the test proper. Before we start, however, keep in mind that although fun and learning are the primary goals of the enrichment centre activities, serious injuries may occur. For your own safety, and the safety of others, please refrain from GFRXTHGGGHHHNAAAK

  13. FrankJ2

    I wonder if all the big tech companies are going to buy TTS companies now? Maybe iSpeech?

This topic is closed for new posts.

Biting the hand that feeds IT © 1998–2019