Imagine you are waiting for the 2PM subway train. Based on your experience, you know that the train always arrives between 1:55PM and 2:05PM. You glance at your watch—it’s 1:55PM, and you return to reading your newspaper. Several minutes later, you check your watch again. Now it’s 2:05PM, and you move closer to the edge of the subway platform. You stare expectantly into the tunnel, and sure enough, you see the train’s lights approaching.
Now imagine a near-identical scenario, in which the train usually arrives between 1:55PM and 2:05PM, but occasionally doesn’t come. At 2:05PM, you check your watch and sigh dejectedly. Is the train late, or is it not coming at all? Maybe it’s time to start planning an alternate route.
These scenarios illustrate that we constantly infer when and if events will occur. By 2:05PM, we increasingly anticipated the reliable train’s arrival because it always arrives by that time. In contrast, we became increasingly pessimistic in the case of the unreliable train. Based on our prior knowledge of arrival timing and probability, we inferred that the train wouldn’t come. If the unreliable train did show up at 2:05PM, it would be a pleasant surprise.
Our new study suggests that a group of cells located deep in the midbrain report a ‘surprise’ signal that, as in real life, uses prior information to make further inferences.
We recorded from midbrain dopamine neurons while thirsty mice performed a classical conditioning task. Rather than waiting for trains to arrive, the mice learned to anticipate water rewards after being presented with certain odors. If the mouse unexpectedly received a reward, dopamine neurons produced a large positive response. If the mouse predicted a reward following an odor presentation, dopamine neurons produced a smaller response. These dopamine signals are called ‘reward prediction errors’ (RPEs), and they represent the discrepancy between actual and expected reward. By signaling surprising positive outcomes, positive dopamine RPEs are thought to reinforce behaviors leading to favorable consequences.
In our study, we trained mice on classical conditioning tasks that mirrored the timing unpredictability of the train scenarios. On any given trial, the time interval between cue and reward was chosen randomly from a normal distribution. In the first task, reward was always delivered (100% rewarded), similar to the reliable train. In the second task, reward was occasionally omitted (90% rewarded), similar to the unreliable train. We found that dopamine RPEs exhibited a striking difference between these two tasks. In the 100% rewarded task, dopamine RPEs were largest if reward was delivered early, and smallest if reward was delivered late. In order words, RPEs became smaller as time elapsed, indicating that reward expectation grew as a function of time. This result parallels the first train example: we increasingly expect the reliable train to arrive as time passes. In the 90% rewarded task, this trend flipped: dopamine RPEs were smallest if reward was delivered early, and largest if reward was delivered late. Dopamine RPEs became larger as time passed, indicating that reward expectation decreased as a function of time. This result tracks our inference in the second train example: as time passes, our belief that the train is simply late yields to the belief that the train will not arrive, making it quite surprising if the train actually arrives at 2:05PM.
Our result shows that dopamine RPEs are exquisitely sensitive to inference about event timing and probability. Although this result may seem intuitive—even obvious—it provides a key theoretical advance for reinforcement learning. Traditionally, the brain’s reinforcement learning circuitry is thought to cache cue-reward associations independent of inference about the environment. This type of system would be just as surprised by the train arriving at 1:55PM as it would be at 2:05PM, in either of the above scenarios. Our data argues against this simple model, and suggests that the brain’s reinforcement learning circuitry taps into inferences about an uncertain environment.
In order to compute prediction errors, the midbrain dopamine system must be able to access accurate predictions. Our results suggest that the dopamine system benefits from the brain’s ability to make inferences across time, ensuring that these predictions are as accurate as possible—even when outcomes are uncertain.