Imagine you’re at your favorite dive bar, and you order a margarita. Usually, the bartender arrives with your drink within 10 minutes, but occasionally, she forgets your order if the bar is busy. As ten minutes pass, you start to wonder if the busy bartender has forgotten your order, or if your margarita is just taking a long time to make. Should you approach the bar again, or continue waiting? Five more minutes go by. When the bartender finally arrives with your drink, you are pleasantly surprised.
We are using this example to illustrate a concept called hidden state inference. You can conceive of the above scenario in two possible states: ‘my drink will arrive’ and ‘my drink will not arrive’. The state of the environment is hidden because you cannot know for certain (until you receive your drink), which state you are in. So, the state of the environment must be inferred, and this inference must be updated over time as new information is observed. Mathematically, hidden state inference can be described using a belief state, or a probability distribution over hidden states. At first, your belief state may be 90%-10%, with 90% optimistically allotted to the ‘my drink will arrive’ state. As you wait, your belief state becomes more pessimistic, perhaps drifting to 50%-50%, resulting in greater surprise when your drink actually arrives later. We constantly use hidden state inference to evaluate options and make decisions in uncertain natural environments. However, little is known about where belief states are computed in the brain. Our recent study suggests that the medial prefrontal cortex (mPFC) is critical in shaping belief states across time.
A challenge in studying the neural basis of hidden state inference is that the belief state is typically unobservable. One strategy is to measure dopamine signals, which are sensitive to the animal’s beliefs about whether reward will be delivered at a particular moment in time. Midbrain dopamine neurons signal reward prediction errors, or actual minus expected reward. Therefore, dopamine responses are smaller if reward is expected, and larger if reward is not expected. We trained thirsty mice on two different classical conditioning tasks. In both tasks, the timing of a water reward relative to a sensory cue was jittered across trials: sometimes rewards arrived as early as 1.2s after cue onset, and sometimes rewards arrived as late as 2.8s after cue onset. In one task, reward was delivered in 100% of trials, and in the other task, reward was delivered in 90% of trials. The 90%-rewarded task, by analogy to the bar example, implicates hidden state inference, because the animal cannot know for certain it is in the ‘water will arrive’ state or the ‘water will not arrive’ state. In contrast, the 100%-rewarded task is fully observable (imagine instead placing an order in a reliable upscale restaurant), because the animal knows for certain that a reward will arrive after cue onset. Upon recording from dopamine neurons of mice trained on these two different tasks, we found that dopamine neurons showed strikingly different signaling patterns across time. In the 100%-rewarded task, dopamine signals grew smaller as a function of time: if reward was delivered at the earliest possible time, the dopamine signal was largest, and if reward was delivered later, dopamine signals were smaller. Thus, in the 100%-rewarded task, expectation grows as a function of time. Analogously, after ordering at an upscale restaurant, you increasingly expect the waiter to bring out your food the longer you wait for it. In contrast, in the 90%-rewarded task, dopamine signals grew larger as a function of time: if reward was delivered at the earliest possible time, the dopamine signals were smallest, and if reward was delivered later, dopamine signals were larger. Similar to the dive bar example, this result could be modeled by taking hidden state inference into account: by very late timepoints, the animal’s belief state shifts to acknowledge the possibility of reward omission (in our example, reward omission is akin to the bartender forgetting your order). Thus, when a reward does arrive, it is very surprising and evokes a large dopamine response.
Upon inactivating the mPFC, we found that dopamine signals no longer grew larger as a function of time in the 90%-rewarded task. In fact, some dopamine neurons recorded in the 90%-rewarded task showed reward responses that grew smaller as a function of time, similar to the 100%-rewarded task. In other words, upon inactivating the mPFC, some dopamine neurons in the 90%-rewarded task signaled as if the belief state failed to yield to the possibility of reward omission and therefore displayed firing patterns similar to those observed in the 100%-rewarded task. In contrast, upon inactivating the mPFC in the 100%-rewarded task, which was fully observable and did not implicate hidden state inference, dopamine responses were unchanged. Therefore, our result is explained by a circuit in which the mPFC shapes the belief state conveyed to downstream subcortical circuits.
Making decisions despite uncertain conditions is a part of everyday life. The first step in acting under uncertainty is to infer the underlying hidden state of the environment to begin with. For instance, if your belief state favors the possibility that your drink will arrive, it wouldn’t make sense to decide to bother the bartender again. Alternatively, if you have been waiting for 15 minutes, and your belief state now favors the possibility that your drink has been forgotten, you should decide to check on your order. By inactivating the mPFC, we tricked dopamine neurons into signaling as if the mouse’s belief state was erroneously frozen in time to favor the ‘my drink will arrive’ state. Our finding places mPFC at the center of the neural circuitry underlying hidden state inference, which is an essential computation to making decisions under uncertainty.