COMMON CURRENCY FOR REWARD AND PUNISHMENT [UCHIDA LAB]


                    Considering both good and bad is fundamental for future planning and decision-making. When animals forage, they may consider how tasty the food is, how much food is available and how nutritious the food is. At the same time, animals have to also consider how many predators exist and how dangerous the location is. How does our brain handle such multi-dimensional factors?
Dopamine is important for reward-related behaviors. Electrophysiology studies showed that dopamine neurons are excited by reward-predicting cues and unpredicted reward. Importantly, dopamine neurons do not distinguish the modality of stimuli (i.e. visual or olfactory) nor the kinds of reward (i.e. food or drink) but encode more abstract factors: dopamine neurons encode reward prediction errors, signaling the discrepancy between predicted value of reward and actual value of reward. Thus, dopamine neurons represent subjective value of reward in one dimension, taking into account taste, amount, probability etc.
Whereas it is widely accepted that dopamine neurons encode reward prediction errors, it is controversial about how dopamine neurons respond to aversive stimuli. The complication may come from the diversity of dopamine neurons: some dopamine neurons are excited and others are inhibited by an aversive stimulus. On the other hand, another hypothesis proposes that dopamine neurons encode only reward but do not encode aversiveness at all. Moreover, other types of neurons such as GABA neurons surround dopamine neurons in the midbrain. Previous studies relied on indirect methods to identify dopamine neurons, potentially contributing to the inconsistent results across studies.
In this study, Hideyuki Matsumoto, our postdoc, unambiguously identified 72 dopamine neurons in the ventral tegmental area (VTA) using optogenetic technique, and examined their responses to an aversive stimulus. He found that dopamine neurons indeed were inhibited by an aversive stimulus and by a cue predicting it. Most dopamine neurons encoded rewarding, neutral and aversive stimulus monotonically. Further, if one cue predicted a rewarding outcome and an aversive outcome in a probabilistic manner, responses to that cue were between the responses to a reward-predicting cue and responses to an aversiveness-predicting cue. In other words, dopamine neurons integrated information from both valences, rewarding and aversive.
However, when Hideyuki repeated a similar experiment, he did not get consistent results. This was not due to ambiguous identification of dopamine neurons as he applied optogenetic identification for both experiments. But he realized that the two experiments differed in reward probability in the tasks in addition to other small differences. Could reward contexts cause inconsistencies between his two experiments and also with the previous studies? He next directly compared dopamine activities in two tasks where only reward probability differed. The results supported our idea: most dopamine neurons encoded negative value of aversive stimulus in low reward context, whereas half of them lost this ability in high reward context. Further, dopamine responses were not reliable when animals did not engage the task (ex. when they did not show anticipatory blinking after air puff-predicting cue or did not show anticipatory licking after water-predicting cue). Together, this series of experiments warn that we have to consider the animal’s state and training history, even for seemingly simple tasks such as classical conditioning.
Of note, many electrophysiology studies are conducted under unnaturally high reward context in order to motivate animals. In these settings, some dopamine neurons do not respond to aversive stimulus consistently. By contrast, in low reward contexts, dopamine neurons integrate information of rewarding and aversive stimulus in one dimension. Thus, this study expanded the idea of “reward prediction error” into more general “value prediction error” in dopamine neurons in VTA. Dopamine as a common currency for all good and bad can be an ideal signal to guide various behaviors such as learning and economical choice.
Read more in eLife or download PDF