The first thing coming to our mind when talking about positive reinforcement is maybe dogs. Pavlov with his experiment with his homonymous dogs was o pioneer of behavioural psychology. Something more than a century later from Pavlov’s experiments Johns, Hopkins University computer scientists showed a robot how to teach itself several new tricks, including stacking blocks by using the method of positive reinforcement, an approach familiar to anyone who has used treats to change a dog’s behaviour. With this method, the robot, named Spot, was able to learn in days what typically takes a month.
The study is published in IEEE Robotics and Automation Letters and can be found here.
The challenge in this approach is that computer (~ robot) -unlike humans and animals that are born with highly intuitive brains- is actually a tabula rasa and must learn everything from scratch. Nevertheless, true learning is often accomplished with trial and error, and figuring out how robots can learn efficiently from their mistakes is yet a field to be explored.
With reference to the parallel example of dogs, here the award is gaining numeric points for reinforcement, instead of a cookie at the dog’s parallel.
In order to teach Spot, the robot to stack blocks, it is essential to learn how to focus on constructive actions. As the robot explored the blocks, it quickly learned that correct behaviours for stacking earned high points, but incorrect ones earned nothing. The whole venture was beyond all expectations successful. This prototype provides us a notion of the abilities robots have to learn from mistakes in all types of situations that are critical for designing a robot that could adapt to new environments.