back to article Total recog: British AI makes universal speech breakthrough

SpeechMatics, the company founded by British neural network pioneer Tony Robinson, has made major advances in speech recognition. Speechmatics’ Automatic Linguist can now add a new language to its system automatically – without human intervention or tuning – in about a day, crunching through 46 new languages in just six weeks …

  1. colinb

    Positive

    Given the silly negativity on AI around here this is a welcome article.

    Technology in this area is moving forward and will simply be better than humans in a lot of areas and we need to deal with that.

    The news this morning was of cancer scans misread by untrained junior doctors.

    Since missing cancer like this could be a death sentence and a reported 100,000 backlog of scans to be checked, pattern recognition in this area could be a life saver.

    You think machines can't do this, you'd be wrong. (https://blogs.nvidia.com/blog/2017/10/30/detecting-lung-cancer/)

    What's missing are medical professionals with some vision.

    1. Anonymous Coward
      Anonymous Coward

      Re: Positive

      Non AI based methods for this have been around for a while too.

      Ages ago, worked on project where digital images from slides of cells stained with reagents where assessed by cancer specialists as to how likely to be cancerous.

      Variety of tissue types and staining reagents used.

      We developed algorithms (per tissue type / reagent combination) - a bit of pattern recognition, image analysis (heavily informed by what clinicians said they used as "key signs" to guide them)

      These were tried on old & a new set of images, and see how well algorithms compared to clinicians on new images.

      A few refining cycles and good agreement.

      This was then used, not as auto diagnosis system, but in real world to flag if "automated" system and a doctor gave markedly different assessments. Then task was to see if system error or human error - flagged images scheduled for check made by multiple different clinicians.

      Inspecting biopsies (and any similar is this likely to be cancerous inspection) is slow, tedious & mentally fatiguing, human error can occur (especially after several hours) - having automated systems to help can only be a good thing.

      AC obv

      1. colinb

        Re: Positive

        Right, no one is saying its 100% replacement (yet) but part of checks and balances system.

        In this case automated checks on the junior doctors scans could have flagged this up much earlier to skilled people.

        Given solutions like yours already exist the blockage here is culture and attitude. Tech won't be given a chance to threaten or supplant Doctor supremacy anytime soon. Since it's free, much lip-service will be paid.

        Having said that my doctor is excellent.

      2. Destroy All Monsters Silver badge
    2. macjules Silver badge

      Re: Positive

      Technology in this area is moving forward and will simply be better than humans in a lot of areas and we need to deal with that.

      Well, yes there is that. We need a lot more Dr Tony Robinsons in this world if we are going to create an AI that is truly beneficial to society, and not just another Googlebot or Siribot (but obviously not in Scotland or Tyneside).

      Speaking of the chocolate octopus, how long will it be before Google buy Speechmatics?

  2. Nik 2

    Mandarin?

    Is that the language that Civil Service documents are written in?

    On a more serious note [really? - ed], one only has to embed this into a SoC and miniaturise it to have a digital babelfish. Responsible for "more and bloodier wars than anything else in the history of creation", IIRC. What could possibly go wrong?

    Technically pretty impressive though.

  3. SwizzleStick

    A.I. for P.M.

    I'd vote for that

    1. Anonymous Coward
      Anonymous Coward

      Re: A.I. for P.M.

      An AI would be more human....

  4. Mage Silver badge

    Noam Chomsky

    Human Languages (spoken or signed) are at some deeper level more similar than different.

    Non-humans often have vocabularies and can even learn phrases and signs from humans, but not one has shown ability to do Language in the proper sense.

    Interesting research.

  5. Destroy All Monsters Silver badge
    Thumb Up

    Good Work

    To stay on subject, I will just leave this here:

    Bob Coecke: "Quantum algorithms for compositional natural language processing"

  6. Evil Auditor Silver badge

    That really is impressive work!

    But, to confirm colinb (...silly negativity on AI around here...), I find the term AI is used generously. Too generously, in fact. Admittedly, I've probably missed the latest development in AI. But my impression still is that what is named AI is just pattern recognition, even if sophisticated pattern recognition.

    A board of wood with a square whole does pattern recognition, too.

    1. colinb

      Sure, this is pattern recognition and just part of AI.

      Our brains are pattern matching machines, there is no general reasoning without abstraction of patterns, combined with rules and storage that is our 'mental model'.

      Cognition and reasoning are the next stage up and there is work on that area e.g https://opencog.org/ this needs more theory but also better processing power, it mainly academics at the moment but money will flow to commercialise/privatise eventually

    2. Michael Wojcik Silver badge

      my impression still is that what is named AI is just pattern recognition, even if sophisticated pattern recognition

      I don't think that's an accurate characterization. There's a great deal of research being done in machine learning that goes far beyond pattern recognition. You might want to have a look through the archives at the morning paper.

      For example, researchers have built NLP-processing stacks that can do semantic algebra. You can pose a query to the engine like "king - man + woman" and it will respond with "queen", having learned (without supervision) to associate vectors of concepts with various words. This has also been done with images; there's a paper demonstrating a network that given an image of someone's face can synthesize an image of them wearing sunglasses - and if you give it an image of someone wearing sunglasses, it can produce an image without them.[1,2]

      Broadly speaking, a lot of the work done with generative networks can't reasonably be described as pattern matching.

      It's also worth noting that ML-based applications which are not, under any reasonable definition, pattern matching have been around for decades. Even calling something like document summarization "pattern matching" is a stretch.

      [1] Obviously that requires some "guesswork" on the part of the network, since it's extrapolating the features it can't "see", such as eye shape and color. But it does an uncannily good job. Again, this is unsupervised; the network stack derived a model from a large corpus of facial images and learned the differences between "with sunglasses" and "without sunglasses".

      [2] I've forgotten which paper did the sunglasses thing, but here's a nice TMP post on some of the interesting things being done with DNNs.

  7. Anonymous Coward
    Anonymous Coward

    News just in - shoulder-pads fashionable again...

    Oddly enough we did something _pretty similar_ 20 years ago at Cambridge (when Tony was there, and who is a top bloke). Nokia wanted a way of building a multi-lingual voice recognition capability without starting from scratch every time to reduce implementation costs. And lo, that we did.

  8. john mullee

    Gulag Archipelago

    Wasn't electronic speech pattern identification the project that Soltzhenitsyn wrote about so inspiringly ?

    1. Mage Silver badge

      Re: Gulag Archipelago

      It probably was translated to English from Russian.

      I must get the newer version of First Circle. The difference between that establishment and certain US companies, is that you can leave. Though they might sue you afterwards. They will insist they own all your ideas. On the plus side they don't physically torture or shoot you for "wrong-think".

  9. Frumious Bandersnatch Silver badge

    So how does it work?

    Viterbi decoders?

    1. Michael Wojcik Silver badge

      Re: So how does it work?

      I don't think so. Viterbi decoders (someone correct me if I'm wrong here) are basically MLE (Maximum Likelihood Estimation) processes for Markov models (HMMs or MEMMs or otherwise).

      This article (and some other bits I've skimmed about SpeechMatics) mentions deep recurrent neural networks. Like a Markov model, a neural network has a bunch of hidden nodes with weighted edges between states; but where an MM's edges represent probabilities of transitions between states, the edges in a NN represent signaling thresholds. So with a NN you can have multiple nodes "active" at any step, whereas an MM has only one current state.

      Then add back-propagation (the bit Hinton's talking about), which is what makes it a recurrent neural network, and you have the weights on the edges constantly changing - as opposed to the typical MM which is trained once and static from then on.

      The Viterbi algorithm is basically a tweak to the Forward algorithm - where the latter finds the total probability of a given output, Viterbi finds the most likely output (given current visible state, hidden state, and input vector). I think in theory it should be possible to convert any RNN to a MM (and thus to a Viterbi decoder), but you'd have a combinatorial increase in in the number of states.

      Viterbi decoders are still used in ML, for example in the simpler stages of natural language processing such as part-of-speech identification (for many applications, not universally). But there's too much ambiguity in natural language to do much NLP with them. A lot of simple sentiment analysis used to be done with HMMs or MEMMs, for example, but it can't cope with sarcasm, references to other subjects, etc. (There was a good paper in CACM a few years ago about a new approach to implementing Rhetorical Structure Theory for sentiment analysis that demonstrates some of the issues.)

      One of the things that's interesting to me about SpeechMatics is that they're using a deep RNN stack rather than deep convolutional neural nets. CNNs were the Next Big Thing for a while, with e.g. Google pushing them heavily. (Extremely simplified: CNNs use a convolution layer - basically a signal-shape matcher - rather than back-prop.) But there's so much research in ML that deep-learning stack architecture has become really complex, with dozens of layers that mix various kinds of networks with mixing layers and such.

      And now we have things like generative adversarial networks, where one network produces a stew of real and forged data, and another network tries to distinguish the two, and they learn from each other. Interesting stuff.

  10. Anonymous Coward
    Anonymous Coward

    Thanks El Reg for the interesting and honest article - great read!

    "The example of seeing something once and realising its significance... We have to step back and investigate a lot of things that don’t work well as neural nets."

    Axios link:

    "The bottom line: Other scientists at the conference said back-propagation still has a core role in AI's future. But Hinton said that, to push materially ahead, entirely new methods will probably have to be invented. "Max Planck said, 'Science progresses one funeral at a time.' The future depends on some graduate student who is deeply suspicious of everything I have said."

  11. Tom 7 Silver badge

    Looking good!

    Now, can you just feed it any language and it can work out which one it is?

    1. AlgernonFlowers4

      Re: Looking good!

      Yes it can definitely say that’s language that is

  12. Francis Boyle Silver badge

    Idea for a Candid Camera style video

    Approach random people in the street and ask them in English how to say "thank you" in English. See if they do any better than a computer.

  13. Sirius Lee

    Adding languages is great but what is the quality of the translation like?

  14. RealBigAl

    Obligatory Burnistoun sketch

    https://youtu.be/sAz_UvnUeuU

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019