Skip to main content

Verified by Psychology Today

Neuroscience

Positive Reinforcement: A New Look at an Old Concept

Reward is only half of the neural activation animals need to learn.

Key points

  • New practices in positive reinforcement training allow animals to learn more effectively.
  • Prediction is just as important as reward in positive reinforcement training.
  • Lures do not elicit the neural activation or dopamine release that leads to strong learning.
  • Rewards do not have to be edible. In fact, training power can be wasted with too many treats.
Xiang Gao/Unsplash
This palomino looks ready to learn!
Source: Xiang Gao/Unsplash

Animal training leans toward positive reinforcement (PR) for many reasons. It teaches good behavior in ways that are safer, more pleasant, quicker, and more effective than negative reinforcement or punishment. We also know more about dopamine release in the animal brain now, which is the neural mechanism underlying the success of PR training. When neurons release the “liquid joy” of dopamine, learning becomes much stronger.

I advocate training horses by reward for all these reasons, but my work doesn’t always dovetail with common PR techniques. I use food rarely, avoid lures, and modulate the horse’s expectations of reinforcement. My reason for developing new practices has to do with new research concerning the effect of surprise on the animal brain’s reward systems. Standard PR training as used on horses today works—but it’s not as effective as it could be.

Understanding positive reinforcement

Dopamine release depends not just on receiving a reward, but also on an animal’s expectation about the value and delivery of that reward. Neuroscientist Wolfram Schulz, mathematician Peter Dayan, and psychiatrist Ray Dolan showed that every PR training trial produces two distinct bursts of neural activation in an animal’s brain. One is linked to receiving the reward. Nothing new there—it’s simply a sign that neurons fire and release dopamine when a reward is given. That’s the basis of PR training as it has been used on horses for several years.

The second burst of neural activation is linked to the animal’s prediction of the reward. That’s the new part, and it’s important when training all kinds of animals. Specifically, the greatest surge of dopamine is released when a reward comes as a surprise. So for each positive behavior, we have to consider what the animal expects and how that expectation might match up with reality.

Let’s suppose a horse—or dog, chimp, dolphin, whatever—expects a reward upon performing a known task. And let’s imagine the horse knows what the reward will be—a stroke on the neck, a moment to relax, or a morsel of food that the standard PR trainer is holding. When that reward is delivered, there’s no surprise. Given the evidence that expectation has its own special neural activation, we now know that the lack of surprise produces less dopamine than a surprising reward would have. Which, in turn, produces less learning. Hmm, that’s not what we want!

Horses are much more likely to repeat behaviors whose rewards exceed their expectations. But there’s another problem with typical PR training, too: Many so-called “rewards” are actually lures. A reward appears immediately after good behavior; a lure is evident before or during behavior. When a trainer holds a handful of treats and doles out one after another to shape a horse’s behavior, there is nothing surprising about them.

Suppose I want to catch a horse in a large pasture. I walk in carrying a nice fat carrot in my hand. I shake the lure around, showing it to the horse before calling her to come to me. She moseys over, because hey, I’m holding a carrot! This carrot offers the horse no surprise. It is unlikely to activate either the horse’s reward system or the prediction system, so little if any learning will occur.

Now, let’s use that same carrot as a reward. I enter the pasture and call the horse. The carrot is hidden deep inside my pocket, and the horse is too far away to smell it inside its zip-locked bag.

But she’s curious about me. She moseys over to investigate. When she arrives at my side, I pull the carrot from my pocket and give it to her. Wow! The horse is amazed, it’s so good, so fresh, makes such a delightful “snap” when she bites it! By making the reward surprising, and by offering it only after the desired behavior occurs, we have increased our training power dramatically. A lure wouldn’t have had anywhere near the same effect.

The surprise of receiving this second carrot—as a reward, not a lure—will activate the equine reward and prediction systems, causing strong dopamine release in the horse’s brain, cementing the lesson. The next time I call, the horse will be more likely to come to me, because true brain-based learning has occurred.

Key takeaways

By the way, I used a carrot in this example, but food is not necessary for training by reward. In fact, it reduces our training power over time. I’ll explain that in my next Psychology Today article. Stay tuned!

To sum up, when learning by positive reinforcement, animals need both components of the brain’s PR system to be activated. Trainers have to manage human choice and delivery of rewards, but it is equally important to manage the horse’s prediction of those rewards. After all, horse brain, human brain! Right?

References

Schultz, W, Dayan, P., & Montague, P.R. (14 March 1997). “A Neural Substrate of Prediction and Reward,” Science, 275, 1593-1599.

Gadye, L. (December 21, 2021). "Discovering Dopamine’s Role in Reward Prediction Error," Brainfacts/SfN. https://www.brainfacts.org/brain-anatomy-and-function/genes-and-molecul…

advertisement
More from Janet L. Jones Ph.D.
More from Psychology Today