Decision Making
Drift-diffusion models, evidence accumulation, reward processing, and the role of dopamine in value-based decisions
The Neuroscience of Choice
Every action requires a decision — from saccading toward a target to choosing between career paths. The brain must integrate uncertain sensory evidence, weigh costs and benefits, and commit to a course of action in a timely manner. Remarkably, simple mathematical models of evidence accumulation capture both the behavioral (choice probabilities, reaction times) and neural (ramping activity in decision-related areas) signatures of decision-making.
This chapter covers the drift-diffusion model and its neural implementation, the speed-accuracy tradeoff, reward-based decision-making via dopamine signaling, and the computational principles of optimal decision-making under uncertainty.
1. The Drift-Diffusion Model
The drift-diffusion model (DDM), also called the sequential probability ratio test (SPRT) in its optimal form, describes decisions as a process of accumulating noisy evidence over time until a decision threshold is reached. Originally developed in statistical decision theory (Wald, 1947), it was connected to neural activity by Gold and Shadlen (2001).
Derivation 1: The DDM and Its Solution
The decision variable $x(t)$ accumulates evidence according to:
$$dx = \mu \, dt + \sigma \, dW$$
where $\mu$ is the drift rate (evidence strength), $\sigma$ is the diffusion coefficient (noise), and $dW$ is a Wiener process increment. The decision is made when $x(t)$ first reaches either boundary: $+a$ (choice A) or $-a$ (choice B), starting from $x(0) = z$. The probability of choosing A is:
$$P(A) = \frac{1 - \exp(-2\mu z / \sigma^2)}{1 - \exp(-2\mu a / \sigma^2)}$$
For symmetric boundaries ($z = 0$), this simplifies to:
$$P(A) = \frac{1}{1 + \exp(-2\mu a / \sigma^2)}$$
This is a logistic function of the signal-to-noise ratio $\mu a / \sigma^2$. The mean decision time for the correct response is:$\langle T \rangle = \frac{a}{\mu} \tanh\left(\frac{\mu a}{\sigma^2}\right)$, which decreases with evidence strength $\mu$ and increases with threshold $a$.
2. Speed-Accuracy Tradeoff
A fundamental constraint in decision-making is the speed-accuracy tradeoff (SAT): faster decisions are less accurate, and more accurate decisions take longer. The DDM provides an elegant account: the decision threshold $a$ controls the tradeoff.
Derivation 2: Optimal Threshold and Reward Rate Maximization
If the decision-maker receives reward $R$ for correct responses and penalty$-C$ for errors, with a non-decision time $T_{\text{nd}}$ (sensory encoding + motor execution), the reward rate is:
$$\rho(a) = \frac{R \cdot P_c(a) - C \cdot (1 - P_c(a))}{\langle T(a) \rangle + T_{\text{nd}}}$$
Taking $d\rho/da = 0$ yields the optimal threshold. For the DDM with symmetric boundaries:
$$\frac{(R+C) \frac{dP_c}{da}}{\langle T \rangle + T_{\text{nd}}} = \rho^* \cdot \frac{d\langle T \rangle}{da}$$
This implicit equation shows that the optimal threshold increases with the reward-to-cost ratio $R/C$ and with non-decision time $T_{\text{nd}}$ (because time is already "wasted" on non-decision processes, it pays to be more accurate). Experimentally, subjects adjust their threshold in response to speed vs. accuracy instructions, with corresponding changes in LIP neural thresholds (Heitz and Schall, 2012).
3. Neural Evidence Accumulation
Neurons in the lateral intraparietal area (LIP) of macaque monkeys show ramping activity during perceptual decisions that closely matches the DDM. Shadlen and Newsome (1996, 2001) recorded LIP neurons during a random dot motion discrimination task, finding that: firing rates ramp toward a threshold that is invariant across difficulty levels, the ramp slope increases with motion coherence (evidence strength), and reaction time corresponds to when neural activity reaches the threshold.
Derivation 3: Mutual Inhibition Model of Evidence Accumulation
A biologically plausible implementation uses two competing neural populations with mutual inhibition. Let $r_1, r_2$ be the firing rates of populations favoring choices 1 and 2:
$$\tau \frac{dr_1}{dt} = -r_1 + f\left(w_{EE} r_1 - w_{EI} r_2 + I_1 + \sigma_n \eta_1(t)\right)$$
$$\tau \frac{dr_2}{dt} = -r_2 + f\left(w_{EE} r_2 - w_{EI} r_1 + I_2 + \sigma_n \eta_2(t)\right)$$
where $I_1 - I_2$ represents the evidence favoring choice 1, and $f$ is a nonlinear transfer function. Near the bifurcation point, the difference$\Delta r = r_1 - r_2$ follows approximately a DDM:
$$\tau_{\text{eff}} \frac{d\Delta r}{dt} \approx (I_1 - I_2) + (w_{EE} - w_{EI} - 1)\Delta r + \text{noise}$$
When $w_{EE} - w_{EI} < 1$ (stable regime), the system integrates evidence linearly. The effective time constant $\tau_{\text{eff}} = \tau / (1 - w_{EE} + w_{EI})$determines the integration timescale, which can be much longer than the membrane time constant, explaining slow evidence accumulation over hundreds of milliseconds.
4. Dopamine and Value-Based Decision-Making
Value-based decisions require comparing the expected values of different options. Dopamine neurons in the ventral tegmental area (VTA) and substantia nigra encode reward prediction errors (RPEs), providing the teaching signal that updates value estimates.
Derivation 4: Softmax Action Selection from Value Estimates
Given learned values $Q(a_i)$ for actions $a_1, \ldots, a_K$, the brain must select an action. The softmax (Boltzmann) policy balances exploitation and exploration:
$$P(a_i) = \frac{\exp(\beta \, Q(a_i))}{\sum_{j=1}^{K} \exp(\beta \, Q(a_j))}$$
where $\beta$ is the inverse temperature. The values are updated via TD learning:
$$Q(a_i) \leftarrow Q(a_i) + \alpha \left[r - Q(a_i)\right]$$
The RPE $\delta = r - Q(a_i)$ matches the phasic dopamine response: firing above baseline for unexpected rewards ($\delta > 0$), below baseline for omitted rewards ($\delta < 0$), and at baseline for fully predicted rewards ($\delta = 0$). The explore-exploit balance controlled by $\beta$ maps to prefrontal cortex modulation of decision circuits.
Derivation 5: Urgency Signal and Collapsing Bounds
In many real decisions, deliberation time is costly. The urgency-gating model proposes that evidence is multiplied by a time-dependent urgency signal $u(t)$:
$$x(t) = u(t) \cdot \int_0^t e(s) \, ds$$
Equivalently, the decision can be modeled with collapsing (time-dependent) boundaries:
$$a(t) = a_0 \exp(-t / \tau_{\text{collapse}})$$
The optimal collapse rate depends on the prior distribution of difficulties. For a Bayesian decision-maker with a known prior over drift rates $p(\mu)$, the optimal boundary satisfies:
$$a^*(t) = \arg\max_a \left[\frac{\langle R \cdot P_c(\mu, a) \rangle_\mu}{\langle T(\mu, a) \rangle_\mu + T_{\text{nd}}}\right]$$
When easy and hard trials are intermixed, collapsing bounds outperform fixed bounds because they prevent excessive time on impossible trials. Neural evidence for urgency signals has been found in the supplementary eye field and caudate nucleus.
5. Historical Development
- • 1947: Abraham Wald develops the sequential probability ratio test (SPRT), proving it is optimal for sequential hypothesis testing.
- • 1978: Ratcliff introduces the diffusion model for memory retrieval, launching its application in psychology.
- • 1993: Schultz, Dayan, and Montague discover that dopamine neurons encode reward prediction errors consistent with TD learning.
- • 1996: Shadlen and Newsome record LIP neurons during motion discrimination, finding ramping activity consistent with evidence accumulation.
- • 2002: Mazurek et al. demonstrate that LIP neurons implement a decision threshold mechanism.
- • 2006: Bogacz et al. prove that competing accumulator models with mutual inhibition are equivalent to the SPRT.
- • 2012: Heitz and Schall show that speed-accuracy tradeoff modulates neural thresholds in frontal eye field.
- • 2015: Hanks et al. discover urgency signals in caudate that implement collapsing decision bounds.
6. Applications
Clinical Psychiatry
DDM parameters decompose behavioral deficits in ADHD (reduced threshold), depression (reduced drift rate), and anxiety (biased starting point). This enables computational phenotyping for precision psychiatry.
Addiction
Dopamine dysfunction in addiction disrupts value-based decision-making, biasing choices toward immediate rewards. RL models reveal altered RPE signaling and discount factors in substance use disorders.
Autonomous Systems
Sequential sampling models inspire decision algorithms for self-driving cars and robots that must make rapid decisions under uncertainty, balancing speed and accuracy in time-critical situations.
Behavioral Economics
Neuroeconomics uses DDM parameters to explain choice anomalies (framing effects, loss aversion). Value signals in ventromedial prefrontal cortex and striatum provide neural correlates of subjective utility.
7. Computational Exploration
Decision Making: Drift-Diffusion, Reward Learning, and Evidence Accumulation
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server
Chapter Summary
- • Drift-diffusion model: decisions as noisy evidence accumulation to a threshold, with accuracy $P(A) = 1/(1 + e^{-2\mu a/\sigma^2})$.
- • Speed-accuracy tradeoff: the threshold $a$ controls the balance; optimal threshold maximizes reward rate.
- • Neural implementation: competing accumulators with mutual inhibition implement the DDM through slow integration dynamics.
- • Dopamine and RPEs: phasic dopamine encodes $\delta = r + \gamma V(s') - V(s)$, driving value-based learning.
- • Collapsing bounds: time-dependent urgency signals optimize decisions when deliberation is costly.