Reply to post:

Reinforcement learning woes, robot doggos, Amazon's homegrown AI chips, and more

Steve Knox
Facepalm

“A researcher gives a talk about using RL to train a simulated robot hand to pick up a hammer and hammer in a nail. Initially, the reward was defined by how far the nail was pushed into the hole. Instead of picking up the hammer, the robot used its own limbs to punch the nail in. So, they added a reward term to encourage picking up the hammer, and retrained the policy. They got the policy to pick up the hammer…but then it threw the hammer at the nail instead of actually using it.”

This isn't a failure of RL; this is a failure of the researchers to identify and control for their own preconceptions. Why were they trying to train the robot to do thing the most inefficient way possible?

We only use hammers because our hands are too soft. Why should a robot use a hammer to pound a nail? Why was it "wrong" for the robot to identify a perfectly effective solution to the task which didn't require extraneous materials?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon