Recent research from Harvard and Keio University researchers presents a novel link between dopamine-based reward learning and machine learning

A recent research paper found a relationship between dopaminergic activity and the TD (time-difference) learning algorithm, providing basic insights into how the brain relates time-separated signals and rewards.

Dopamine is a common neurotransmitter, and the word “dopaminergic” implies “connected to dopamine” (literally, “acting on dopamine”). The dopamine-related activity of the brain is increased by drugs or dopaminergic behaviors. Dopamine-related action is facilitated by dopaminergic brain circuits.

Research in neuroscience and psychology has consistently shown how critical incentives are in helping people and other animals learn habits that will help them survive. It is well recognized that dopamine neurons, neurons in the mammalian central nervous system that release dopamine, are primarily responsible for reward-based learning in mammals. When a mammal receives an unexpected reward, these neurons respond quickly through a process called phasic arousal.

To develop effective machine learning models capable of handling difficult tasks, computer scientists have recently begun attempting to artificially replicate the neurological underpinnings of reward-based learning in mammals. The so-called time-difference (TD) learning algorithm is a well-known machine learning technique that mimics the functioning of dopaminergic neurons.

Credit: Amo et al. | Source: https://www.nature.com/articles/s41593-022-01109-2

The researcher explored a potential link between the TD computer learning approach and human, incentive-based learning. Their study, published in the journal Nature Neuroscience, could provide new insights into how the brain creates connections between stimuli and time-spaced rewards.

A family of reinforcement learning techniques called “TD learning algorithms” can learn to generate predictions based on environmental changes over time rather than using a model. TD approaches can change their estimates more than other machine learning techniques before revealing their final prediction.

The parallels between TD learning algorithms and reward-learning dopamine neurons in the brain have recently come to light in several researches. However, a specific component of how the algorithm works has only occasionally been considered in neuroscience studies.

Synchronization of dopamine cues should gradually recede from reward-time to cue-time over multiple trials when an agent associates a time-separated cue and reward, according to previous studies that failed to observe the key prediction of this algorithm. .

The researcher reviewed the results of their studies of untrained mice developing the ability to link olfactory signals to water rewards in their publication. The rats showed licking activity that suggested they expected to obtain water after only smelling the relevant odor when they began to associate particular scents with receiving it.

The mice were exposed to the pre-reward odor and the reward at different times during the studies. In other words, they changed the interval between the mice’s exposure to the smell and their receipt of the water reward.

They found that dopaminergic neurons were initially less active when reward was delayed, but eventually became more active. This demonstrated that the timing of dopaminergic responses in the brain can change when mice first learn the links between odors and rewards, as shown by TD learning techniques.

The scientists conducted further studies to determine whether this change also occurred in rats that had previously been trained to generate similar odor-reward linkages during inversion tasks. During the waiting phase, they found a temporal change in the animal’s dopamine signals comparable to when the animals first learned the connections, but which happened more quickly.

Overall, the data show that multiple tests of associative learning caused a backward shift in the timing of dopamine activity in the mouse brain. This discovered time lag is very similar to the mechanics behind TD learning techniques.

Future studies of possible parallels between reward learning in the mammalian brain and TD reinforcement learning strategies could be aided by insights obtained by this team of scientists. This could help advance our understanding of how the brain learns rewards and could also serve as motivation for new TD learning algorithms.

This Article is written as a summary article by Marktechpost Staff based on the research paper 'A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper and reference article.

Please Don't Forget To Join Our ML Subreddit


I am a trainee consultant at MarktechPost. I am majoring in mechanical engineering in IIT Kanpur. My interest lies in the field of machining and robotics. Also, I have a keen interest in AI, ML, DL and related fields. I am a technology enthusiast and passionate about new technologies and their concrete applications.


Comments are closed.