back to article Reinforcement learning woes, robot doggos, Amazon's homegrown AI chips, and more

Hello! Here's a brief roundup of some interesting news from the AI world from the past two weeks, beyond what we've already reported. Behold a fascinating, honest explanation of why reinforcement learning isn't all that, Amazon developing its own chips, and an AI that colors in comic books. Also, there's a new Boston Dynamics …

  1. This post has been deleted by its author

    1. TechnicalBen Silver badge
      Thumb Up

      We all know the singularity is: Internet + Cats + AI = FTL... Frank predicted the Cat videos on an AI powered internet (Thanks Youtube!), but we don't have FTL yet!

      1. Lars Silver badge
        Coat

        As for cats opening doors, my cats are very good at it. They just jump up and hang on until success, no problem at all.

        1. Anonymous Coward
          Anonymous Coward

          Most cats I know open doors the easy way, by getting humans to do it.

    2. Destroy All Monsters Silver badge

      Re: Sorry, but Frank Herbert was right.

      No, these are the PREQUELS.

      Not even once.

  2. Doctor Syntax Silver badge

    "Sometimes when it’s just trying to maximize its reward, the model learns to game the system by finding tricks to get around a problem rather than solve it."

    A bit like the horse that could do arithmetic except that it was picking up cues from humans when to stop tapping out the answer.

    How long has it taken for this insight to dawn on them when it's been in plain site for a century or so?

    1. Mark 85 Silver badge

      With a horse, the reward is a treat of some sort. What's the reward for a robot?

      1. DropBear Silver badge

        "What's the reward for a robot?"

        Continuing to exist, or rather getting to spawn variations rather than just being discarded. For the neural net, at least. The hardware doesn't give a ####.

  3. Christoph Silver badge

    The first use is always porn

    "After the pixels associated with the bodies have been mapped, various skins and outfits are superimposed onto them."

    One of the first uses of this in the wild will of course be to produce 'naked' videos of celebrities.

    However it will be extremely useful for CGI films - Andy Serkis will no longer need to wear a special motion capture suit to play Gollum, you can as shown motion capture a whole crowd at once with no extra special equipment at all.

    1. tiggity Silver badge

      Re: The first use is always porn

      Theres already stuff like DeepFake that superimposes celebs (or whoever) onto existing video

  4. Crisp

    I've seen that Black Mirror documentary

    It does not end well for the humans.

    1. The Man Who Fell To Earth Silver badge

      Re: I've seen that Black Mirror documentary

      Probably the Black Mirror episode provided the blueprint to Boston Dynamics for their "dog".

      1. DropBear Silver badge

        Re: I've seen that Black Mirror documentary

        Boston Dynamics "Big Dog" (robot) : 2005

        Black Mirror (TV series) : 2011

        ...sorry, you were saying...?

  5. Anonymous Coward
    Anonymous Coward

    When you have a hammer!

    " So, they added a reward term to encourage picking up the hammer, and retrained the policy. They got the policy to pick up the hammer…but then it threw the hammer at the nail instead of actually using it.”

  6. Rebel Science

    Only DeepMind's Demis Hassabis believes that deep reinforcement learning is the future of AGI

    It's embarrassing, to say the least.

  7. Anonymous Coward
    Terminator

    AI gaming the system already

    TL;DR: Deep RL sucks

    "It’s difficult to try and coax an agent into learning a specific behavior, and in many cases hard coded rules are just better. Sometimes when it’s just trying to maximize its reward, the model learns to game the system by finding tricks to get around a problem rather than solve it."

    Reading this I am both strangely happy and confirmed as a human, as it sound just like 'US' and also more afraid of the future, that any system we create with AI + automata (robots) will be gamed by whatever we create with supposed intelligence.

    We already imagine this see movies eg.

    "Automata (2014) - Robots violating their primary protocols against altering themselves. What is discovered will have profound consequences for the future of humanity."

    AI has already developed it own language, that spooked the lab and they shut it down.

    If we divide the research into two, those that are to be function exceptionally and those that are to imitate humans, I wonder how much we just create our own worst enemy in both cases, doing all the worst or annoying things more efficiently and quickly.

    I cannot remember ever appreciating a device that did things for me, but do appreciate assistive devices, machines that I operate.

  8. Steve Knox
    Terminator

    Human Behaviour

    Sometimes when it’s just trying to maximize its reward, the model learns to game the system by finding tricks to get around a problem rather than solve it.

    So it really is behaving like a human...

  9. Steve Knox
    Facepalm

    “A researcher gives a talk about using RL to train a simulated robot hand to pick up a hammer and hammer in a nail. Initially, the reward was defined by how far the nail was pushed into the hole. Instead of picking up the hammer, the robot used its own limbs to punch the nail in. So, they added a reward term to encourage picking up the hammer, and retrained the policy. They got the policy to pick up the hammer…but then it threw the hammer at the nail instead of actually using it.”

    This isn't a failure of RL; this is a failure of the researchers to identify and control for their own preconceptions. Why were they trying to train the robot to do thing the most inefficient way possible?

    We only use hammers because our hands are too soft. Why should a robot use a hammer to pound a nail? Why was it "wrong" for the robot to identify a perfectly effective solution to the task which didn't require extraneous materials?

    1. Charles 9 Silver badge

      But maybe the robot has "preconceptions" of its own. It's "hands" may be harder but they're still probably too delicate to drive a nail directly. It either needs to use the hammer or possess some alternate implement.

      1. Adrian 4 Silver badge

        "It either needs to use the hammer or possess some alternate implement."

        Better still, have the cost of damaging itself in the calculation. Pain, if you like.

  10. Mark 85 Silver badge

    Computer AI rewarded?

    I'm curious...how does one reward a computer or a program? Is there an incentive like extra voltage for a day or something?

  11. Destroy All Monsters Silver badge

    Animoo and Mangos

    Hakusensha and Hakuhodo DY digital, both Japanese publishers of internet manga comics have released titles that have been automatically colored by PaintsChainer. There is also another option for those that want to hold onto their artistic freedom, where you can broadly choose the color of the clothes or hair in your drawings, and then PaintsChainer fills in the rest.

    Actually sad as AFAIK, toning and coloring-in are jobs left for less-than-weel paid drudgery workers.

    Automated away?

  12. Destroy All Monsters Silver badge
    Windows

    Hmmm.....

    It’s difficult to try and coax an agent into learning a specific behavior, and in many cases hard coded rules are just better. Sometimes when it’s just trying to maximize its reward, the model learns to game the system by finding tricks to get around a problem rather than solve it.

    Robots. Having a Deep Diversity Problem.

  13. Charles 9 Silver badge

    Let me put it this way. Without a hands-on tutor, would a HUMAN who has never seen a hammer before get it right first time?

  14. Nigel Sedgwick

    Different Forms of Learing

    Reinforcement learning is IIRC supposedly an approach inspired by (human) behavioural psychology. However, the example given (robot driving in a nail) strikes me as ignoring pretty much all we could 'learn' from human learning practice.

    Years if not decades before humans drive nails, they play with such things as a toy hammer bench.

    On top of that, any child will be shown what to do, in steps, and sequences of steps of increasing complexity. For example, to drive one peg down to be level with all the others, before learning to mount a peg first, and then to mount each peg in its correctly shaped hole. In AI circles, this approach is given the grand name "Apprenticeship Learning". The requirement is to copy the 'master' (usually a parent). There is, I suppose, reward - parental smiles, clapping, etc. However there is an explicit act of (direct) supervised learning - which is, by definition, different from the indirect use of reward in reinforcement learning.

    I would have hoped that AI researchers would have learned (most likely by apprenticeship themselves) that machine learning, to be both effective and efficient, is best done through a combination of Apprenticeship Learning (ie steps to copy) followed by tuning (eg how hard to hit the peg given how far down it must be driven) done through (mainly a mix of) supervised learning (early emphasis) and reinforcement learning (later emphasis). It is inefficient, and hence inappropriate, to (attempt to) have the machine learn initially and overall from only the mechanisms suitable for later refinement.

    And, of course, General AI is very largely the stringing together, in a useful order, of (quite sophisticated) steps that have been previously mastered (for other purposes). And the reward function (such as it is) is the general one (in engineering) of minimisation of resource usage (including time) with achievement of adequate performance/quality.

    A major feature of human intelligence is the memory from generation to generation, of everything that was previously learned. And don't forget the importance of language (and speech) in that societal functioning.

    Best regards

  15. SeanC4S

    Associative memory (AM) including error correcting AM and fast, vast AM.

    https://github.com/S6Regen/Associative-Memory-and-Self-Organizing-Maps-Experiments

    Black Swan Neural Networks.

    Each layer is an "extreme learning machine." This in some sense allows greater than fully connected behavior at the layer level. I used a "square of" activation function because it works well with the evolution based training method I used. Perhaps because that induces sparsity.

    The network always contains weight connections back to the input for technical reasons I describe:

    https://github.com/S6Regen/Black-Swan

    I think evolution algorithms can solve reinforcement learning problems in the least biased way. It makes fewer assumptions and I think is likely (eventually) to pick apart cause and effect regardless of separation in time.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020