Part II: Systems Neuroscience | Chapter 3

Decision Making

Drift-diffusion models, evidence accumulation, reward processing, and the role of dopamine in value-based decisions

The Neuroscience of Choice

Every action requires a decision — from saccading toward a target to choosing between career paths. The brain must integrate uncertain sensory evidence, weigh costs and benefits, and commit to a course of action in a timely manner. Remarkably, simple mathematical models of evidence accumulation capture both the behavioral (choice probabilities, reaction times) and neural (ramping activity in decision-related areas) signatures of decision-making.

This chapter covers the drift-diffusion model and its neural implementation, the speed-accuracy tradeoff, reward-based decision-making via dopamine signaling, and the computational principles of optimal decision-making under uncertainty.

1. The Drift-Diffusion Model

The drift-diffusion model (DDM), also called the sequential probability ratio test (SPRT) in its optimal form, describes decisions as a process of accumulating noisy evidence over time until a decision threshold is reached. Originally developed in statistical decision theory (Wald, 1947), it was connected to neural activity by Gold and Shadlen (2001).

Figure. Drift-diffusion model: noisy evidence accumulates over time from a starting point until it reaches an upper (+a) or lower (-a) decision threshold, triggering a choice.

Derivation 1: The DDM and Its Solution

The decision variable $x(t)$ accumulates evidence according to:

$$dx = \mu \, dt + \sigma \, dW$$

where $\mu$ is the drift rate (evidence strength), $\sigma$ is the diffusion coefficient (noise), and $dW$ is a Wiener process increment. The decision is made when $x(t)$ first reaches either boundary: $+a$ (choice A) or $-a$ (choice B), starting from $x(0) = z$. The probability of choosing A is:

$$P(A) = \frac{1 - \exp(-2\mu z / \sigma^2)}{1 - \exp(-2\mu a / \sigma^2)}$$

For symmetric boundaries ($z = 0$), this simplifies to:

$$P(A) = \frac{1}{1 + \exp(-2\mu a / \sigma^2)}$$

This is a logistic function of the signal-to-noise ratio $\mu a / \sigma^2$. The mean decision time for the correct response is:$\langle T \rangle = \frac{a}{\mu} \tanh\left(\frac{\mu a}{\sigma^2}\right)$, which decreases with evidence strength $\mu$ and increases with threshold $a$.

2. Speed-Accuracy Tradeoff

A fundamental constraint in decision-making is the speed-accuracy tradeoff (SAT): faster decisions are less accurate, and more accurate decisions take longer. The DDM provides an elegant account: the decision threshold $a$ controls the tradeoff.

Derivation 2: Optimal Threshold and Reward Rate Maximization

If the decision-maker receives reward $R$ for correct responses and penalty$-C$ for errors, with a non-decision time $T_{\text{nd}}$ (sensory encoding + motor execution), the reward rate is:

$$\rho(a) = \frac{R \cdot P_c(a) - C \cdot (1 - P_c(a))}{\langle T(a) \rangle + T_{\text{nd}}}$$

Taking $d\rho/da = 0$ yields the optimal threshold. For the DDM with symmetric boundaries:

$$\frac{(R+C) \frac{dP_c}{da}}{\langle T \rangle + T_{\text{nd}}} = \rho^* \cdot \frac{d\langle T \rangle}{da}$$

This implicit equation shows that the optimal threshold increases with the reward-to-cost ratio $R/C$ and with non-decision time $T_{\text{nd}}$ (because time is already "wasted" on non-decision processes, it pays to be more accurate). Experimentally, subjects adjust their threshold in response to speed vs. accuracy instructions, with corresponding changes in LIP neural thresholds (Heitz and Schall, 2012).

3. Neural Evidence Accumulation

Neurons in the lateral intraparietal area (LIP) of macaque monkeys show ramping activity during perceptual decisions that closely matches the DDM. Shadlen and Newsome (1996, 2001) recorded LIP neurons during a random dot motion discrimination task, finding that: firing rates ramp toward a threshold that is invariant across difficulty levels, the ramp slope increases with motion coherence (evidence strength), and reaction time corresponds to when neural activity reaches the threshold.

Derivation 3: Mutual Inhibition Model of Evidence Accumulation

A biologically plausible implementation uses two competing neural populations with mutual inhibition. Let $r_1, r_2$ be the firing rates of populations favoring choices 1 and 2:

$$\tau \frac{dr_1}{dt} = -r_1 + f\left(w_{EE} r_1 - w_{EI} r_2 + I_1 + \sigma_n \eta_1(t)\right)$$

$$\tau \frac{dr_2}{dt} = -r_2 + f\left(w_{EE} r_2 - w_{EI} r_1 + I_2 + \sigma_n \eta_2(t)\right)$$

where $I_1 - I_2$ represents the evidence favoring choice 1, and $f$ is a nonlinear transfer function. Near the bifurcation point, the difference$\Delta r = r_1 - r_2$ follows approximately a DDM:

$$\tau_{\text{eff}} \frac{d\Delta r}{dt} \approx (I_1 - I_2) + (w_{EE} - w_{EI} - 1)\Delta r + \text{noise}$$

When $w_{EE} - w_{EI} < 1$ (stable regime), the system integrates evidence linearly. The effective time constant $\tau_{\text{eff}} = \tau / (1 - w_{EE} + w_{EI})$determines the integration timescale, which can be much longer than the membrane time constant, explaining slow evidence accumulation over hundreds of milliseconds.

4. Dopamine and Value-Based Decision-Making

Value-based decisions require comparing the expected values of different options. Dopamine neurons in the ventral tegmental area (VTA) and substantia nigra encode reward prediction errors (RPEs), providing the teaching signal that updates value estimates.

Derivation 4: Softmax Action Selection from Value Estimates

Given learned values $Q(a_i)$ for actions $a_1, \ldots, a_K$, the brain must select an action. The softmax (Boltzmann) policy balances exploitation and exploration:

$$P(a_i) = \frac{\exp(\beta \, Q(a_i))}{\sum_{j=1}^{K} \exp(\beta \, Q(a_j))}$$

where $\beta$ is the inverse temperature. The values are updated via TD learning:

$$Q(a_i) \leftarrow Q(a_i) + \alpha \left[r - Q(a_i)\right]$$

The RPE $\delta = r - Q(a_i)$ matches the phasic dopamine response: firing above baseline for unexpected rewards ($\delta > 0$), below baseline for omitted rewards ($\delta < 0$), and at baseline for fully predicted rewards ($\delta = 0$). The explore-exploit balance controlled by $\beta$ maps to prefrontal cortex modulation of decision circuits.

Derivation 5: Urgency Signal and Collapsing Bounds

In many real decisions, deliberation time is costly. The urgency-gating model proposes that evidence is multiplied by a time-dependent urgency signal $u(t)$:

$$x(t) = u(t) \cdot \int_0^t e(s) \, ds$$

Equivalently, the decision can be modeled with collapsing (time-dependent) boundaries:

$$a(t) = a_0 \exp(-t / \tau_{\text{collapse}})$$

The optimal collapse rate depends on the prior distribution of difficulties. For a Bayesian decision-maker with a known prior over drift rates $p(\mu)$, the optimal boundary satisfies:

$$a^*(t) = \arg\max_a \left[\frac{\langle R \cdot P_c(\mu, a) \rangle_\mu}{\langle T(\mu, a) \rangle_\mu + T_{\text{nd}}}\right]$$

When easy and hard trials are intermixed, collapsing bounds outperform fixed bounds because they prevent excessive time on impossible trials. Neural evidence for urgency signals has been found in the supplementary eye field and caudate nucleus.

5. Historical Development

• 1947: Abraham Wald develops the sequential probability ratio test (SPRT), proving it is optimal for sequential hypothesis testing.
• 1978: Ratcliff introduces the diffusion model for memory retrieval, launching its application in psychology.
• 1993: Schultz, Dayan, and Montague discover that dopamine neurons encode reward prediction errors consistent with TD learning.
• 1996: Shadlen and Newsome record LIP neurons during motion discrimination, finding ramping activity consistent with evidence accumulation.
• 2002: Mazurek et al. demonstrate that LIP neurons implement a decision threshold mechanism.
• 2006: Bogacz et al. prove that competing accumulator models with mutual inhibition are equivalent to the SPRT.
• 2012: Heitz and Schall show that speed-accuracy tradeoff modulates neural thresholds in frontal eye field.
• 2015: Hanks et al. discover urgency signals in caudate that implement collapsing decision bounds.

6. Applications

Clinical Psychiatry

DDM parameters decompose behavioral deficits in ADHD (reduced threshold), depression (reduced drift rate), and anxiety (biased starting point). This enables computational phenotyping for precision psychiatry.

Addiction

Dopamine dysfunction in addiction disrupts value-based decision-making, biasing choices toward immediate rewards. RL models reveal altered RPE signaling and discount factors in substance use disorders.

Autonomous Systems

Sequential sampling models inspire decision algorithms for self-driving cars and robots that must make rapid decisions under uncertainty, balancing speed and accuracy in time-critical situations.

Behavioral Economics

Neuroeconomics uses DDM parameters to explain choice anomalies (framing effects, loss aversion). Value signals in ventromedial prefrontal cortex and striatum provide neural correlates of subjective utility.

7. Computational Exploration

Decision Making: Drift-Diffusion, Reward Learning, and Evidence Accumulation

Python

script.py302 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

print("=" * 72)
print("DECISION MAKING: DRIFT-DIFFUSION, REWARD, AND DOPAMINE")
print("=" * 72)

np.random.seed(42)

# --------------------------------------------------
# 1. DRIFT-DIFFUSION MODEL SIMULATION
# --------------------------------------------------
print()
print("1. DRIFT-DIFFUSION MODEL (DDM)")
print("-" * 55)

dt = 0.001  # 1 ms
threshold = 1.0
sigma = 1.0
drift_rates = [0.0, 0.5, 1.0, 2.0, 4.0]
n_trials = 1000

results = {}
for mu in drift_rates:
    rts = []
    choices = []
    for trial in range(n_trials):
        x = 0.0
        t = 0.0
        while abs(x) < threshold and t < 5.0:
            x += mu * dt + sigma * np.sqrt(dt) * np.random.randn()
            t += dt
        rts.append(t)
        choices.append(1 if x >= threshold else 0)
    results[mu] = {'rts': np.array(rts), 'choices': np.array(choices)}
    acc = np.mean(choices)
    mean_rt = np.mean(rts)
    print("  mu = {:.1f}: accuracy = {:.1f}%, mean RT = {:.3f} s".format(mu, acc * 100, mean_rt))

# Sample trajectories for plotting
sample_trajs = []
for _ in range(8):
    x = 0.0
    traj = [0.0]
    mu_sample = 1.0
    while abs(x) < threshold:
        x += mu_sample * dt + sigma * np.sqrt(dt) * np.random.randn()
        traj.append(x)
        if len(traj) > 5000:
            break
    sample_trajs.append(np.array(traj))

# --------------------------------------------------
# 2. SPEED-ACCURACY TRADEOFF
# --------------------------------------------------
print()
print("2. SPEED-ACCURACY TRADEOFF")
print("-" * 55)

thresholds = np.array([0.3, 0.5, 0.7, 1.0, 1.5, 2.0, 3.0])
mu_sat = 1.0
acc_sat = []
rt_sat = []
reward_rate = []
R_val = 1.0
C_val = 0.0
T_nd = 0.3

for a_val in thresholds:
    rts_t = []
    correct_t = []
    for _ in range(500):
        x = 0.0
        t = 0.0
        while abs(x) < a_val and t < 10.0:
            x += mu_sat * dt + sigma * np.sqrt(dt) * np.random.randn()
            t += dt
        rts_t.append(t)
        correct_t.append(1 if x >= a_val else 0)
    acc_t = np.mean(correct_t)
    rt_t = np.mean(rts_t)
    rr = (R_val * acc_t - C_val * (1 - acc_t)) / (rt_t + T_nd)
    acc_sat.append(acc_t)
    rt_sat.append(rt_t)
    reward_rate.append(rr)

best_idx = np.argmax(reward_rate)
print("  Drift rate: {:.1f}".format(mu_sat))
print("  Non-decision time: {:.1f} s".format(T_nd))
print()
for k, a_val in enumerate(thresholds):
    marker = " <-- optimal" if k == best_idx else ""
    print("  a = {:.1f}: acc = {:.1f}%, RT = {:.3f}s, RR = {:.3f}{}".format(
        a_val, acc_sat[k]*100, rt_sat[k], reward_rate[k], marker))

# --------------------------------------------------
# 3. COMPETING ACCUMULATOR MODEL
# --------------------------------------------------
print()
print("3. COMPETING ACCUMULATOR (MUTUAL INHIBITION)")
print("-" * 55)

tau_acc = 0.1  # 100 ms
w_ee = 0.8
w_ei = 0.4
I1 = 1.2
I2 = 0.8
sigma_n = 0.3
dt_acc = 0.001
T_max = 2.0
n_steps_acc = int(T_max / dt_acc)

# Single trial for trajectory
r1 = np.zeros(n_steps_acc)
r2 = np.zeros(n_steps_acc)
r1[0] = 0.1
r2[0] = 0.1

for i in range(n_steps_acc - 1):
    dr1 = (-r1[i] + max(0, w_ee*r1[i] - w_ei*r2[i] + I1 + sigma_n*np.random.randn())) / tau_acc
    dr2 = (-r2[i] + max(0, w_ee*r2[i] - w_ei*r1[i] + I2 + sigma_n*np.random.randn())) / tau_acc
    r1[i+1] = max(0, r1[i] + dt_acc * dr1)
    r2[i+1] = max(0, r2[i] + dt_acc * dr2)

print("  tau = {:.0f} ms, w_EE = {:.1f}, w_EI = {:.1f}".format(tau_acc*1000, w_ee, w_ei))
print("  I1 = {:.1f}, I2 = {:.1f} (evidence favoring choice 1)".format(I1, I2))
print("  Final r1 = {:.3f}, r2 = {:.3f}".format(r1[-1], r2[-1]))

# --------------------------------------------------
# 4. REINFORCEMENT LEARNING (BANDIT TASK)
# --------------------------------------------------
print()
print("4. VALUE-BASED DECISION (2-Armed Bandit)")
print("-" * 55)

n_bandit_trials = 500
Q = np.array([0.0, 0.0])
alpha_rl = 0.1
beta_rl = 5.0
true_probs = [0.7, 0.3]

Q_history = np.zeros((n_bandit_trials, 2))
choice_history = []
rpe_history = []

for trial in range(n_bandit_trials):
    # Softmax selection
    exp_q = np.exp(beta_rl * Q)
    probs = exp_q / np.sum(exp_q)
    action = np.random.choice(2, p=probs)
    reward = 1.0 if np.random.random() < true_probs[action] else 0.0

rpe = reward - Q[action]
    Q[action] += alpha_rl * rpe
    Q_history[trial] = Q.copy()
    choice_history.append(action)
    rpe_history.append(rpe)

print("  True reward probs: {}".format(true_probs))
print("  Learning rate: {}, Beta: {}".format(alpha_rl, beta_rl))
print("  Final Q values: Q1 = {:.3f}, Q2 = {:.3f}".format(Q[0], Q[1]))
print("  Correct choice rate (last 100): {:.1f}%".format(
    (1 - np.mean(choice_history[-100:])) * 100))

# --------------------------------------------------
# 5. COLLAPSING BOUNDS
# --------------------------------------------------
print()
print("5. COLLAPSING BOUNDS vs FIXED BOUNDS")
print("-" * 55)

mu_collapse = 0.5
n_collapse_trials = 500
tau_collapse = 2.0
a0 = 1.5

# Fixed bounds
fixed_accs = []
fixed_rts = []
for _ in range(n_collapse_trials):
    x = 0.0
    t = 0.0
    while abs(x) < a0 and t < 10:
        x += mu_collapse * dt + sigma * np.sqrt(dt) * np.random.randn()
        t += dt
    fixed_rts.append(t)
    fixed_accs.append(1 if x >= a0 else 0)

# Collapsing bounds
collapse_accs = []
collapse_rts = []
for _ in range(n_collapse_trials):
    x = 0.0
    t = 0.0
    while t < 10:
        bound = a0 * np.exp(-t / tau_collapse)
        if abs(x) >= bound:
            break
        x += mu_collapse * dt + sigma * np.sqrt(dt) * np.random.randn()
        t += dt
    collapse_rts.append(t)
    collapse_accs.append(1 if x >= 0 else 0)

rr_fixed = np.mean(fixed_accs) / (np.mean(fixed_rts) + T_nd)
rr_collapse = np.mean(collapse_accs) / (np.mean(collapse_rts) + T_nd)

print("  Fixed bounds (a={}):".format(a0))
print("    Accuracy: {:.1f}%, Mean RT: {:.3f}s, RR: {:.3f}".format(
    np.mean(fixed_accs)*100, np.mean(fixed_rts), rr_fixed))
print("  Collapsing bounds (a0={}, tau={}):".format(a0, tau_collapse))
print("    Accuracy: {:.1f}%, Mean RT: {:.3f}s, RR: {:.3f}".format(
    np.mean(collapse_accs)*100, np.mean(collapse_rts), rr_collapse))

# --------------------------------------------------
# PLOTTING
# --------------------------------------------------
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle("Decision Making: DDM, Speed-Accuracy Tradeoff, RL, and Collapsing Bounds",
             fontsize=14, fontweight='bold', color='white')
fig.patch.set_facecolor('#0a0a0a')

for ax in axes.flat:
    ax.set_facecolor('#111111')
    ax.tick_params(colors='white', labelsize=8)
    ax.xaxis.label.set_color('white')
    ax.yaxis.label.set_color('white')
    ax.title.set_color('#f0abfc')
    for spine in ax.spines.values():
        spine.set_color('#333333')

# Panel 1: DDM sample trajectories
for traj in sample_trajs:
    t_traj = np.arange(len(traj)) * dt
    color = '#ec4899' if traj[-1] >= threshold else '#60a5fa'
    axes[0,0].plot(t_traj, traj, color=color, alpha=0.5, lw=1)
axes[0,0].axhline(threshold, color='#fbbf24', linestyle='--', lw=2, label='Threshold +a')
axes[0,0].axhline(-threshold, color='#fbbf24', linestyle='--', lw=2, label='Threshold -a')
axes[0,0].axhline(0, color='white', alpha=0.2)
axes[0,0].set_xlabel('Time (s)')
axes[0,0].set_ylabel('Decision variable x(t)')
axes[0,0].set_title('DDM Sample Trajectories (mu=1.0)')
axes[0,0].set_ylim(-1.5, 1.5)

# Panel 2: Speed-accuracy tradeoff
ax2 = axes[0,1]
color2 = '#d946ef'
ax2.plot(rt_sat, acc_sat, 'o-', color='#ec4899', lw=2, markersize=6)
ax2.set_xlabel('Mean RT (s)')
ax2.set_ylabel('Accuracy')
ax2.set_title('Speed-Accuracy Tradeoff')
for k, a_val in enumerate(thresholds):
    ax2.annotate('a={}'.format(a_val), (rt_sat[k], acc_sat[k]),
                fontsize=6, color='white', alpha=0.7)

# Panel 3: Competing accumulators
t_acc_axis = np.arange(n_steps_acc) * dt_acc * 1000
axes[0,2].plot(t_acc_axis, r1, color='#ec4899', lw=1.5, label='Pop 1 (favored)')
axes[0,2].plot(t_acc_axis, r2, color='#60a5fa', lw=1.5, label='Pop 2')
axes[0,2].set_xlabel('Time (ms)')
axes[0,2].set_ylabel('Firing rate')
axes[0,2].set_title('Competing Accumulator Model')
axes[0,2].legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')

# Panel 4: RL learning curves
axes[1,0].plot(Q_history[:, 0], color='#ec4899', lw=1.5, label='Q(arm 1), p=0.7')
axes[1,0].plot(Q_history[:, 1], color='#60a5fa', lw=1.5, label='Q(arm 2), p=0.3')
axes[1,0].axhline(0.7, color='#ec4899', linestyle='--', alpha=0.4)
axes[1,0].axhline(0.3, color='#60a5fa', linestyle='--', alpha=0.4)
axes[1,0].set_xlabel('Trial')
axes[1,0].set_ylabel('Q value')
axes[1,0].set_title('Reinforcement Learning (Bandit)')
axes[1,0].legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')

# Panel 5: RPE distribution
axes[1,1].hist(rpe_history, bins=30, color='#d946ef', alpha=0.7, density=True)
axes[1,1].axvline(0, color='white', linestyle='--', alpha=0.5)
axes[1,1].set_xlabel('Reward prediction error')
axes[1,1].set_ylabel('Density')
axes[1,1].set_title('Dopamine RPE Distribution')

# Panel 6: Collapsing bounds
t_bounds = np.linspace(0, 5, 200)
fixed_bound = np.ones_like(t_bounds) * a0
collapse_bound = a0 * np.exp(-t_bounds / tau_collapse)
axes[1,2].plot(t_bounds, fixed_bound, '--', color='#60a5fa', lw=2, label='Fixed')
axes[1,2].plot(t_bounds, -fixed_bound, '--', color='#60a5fa', lw=2)
axes[1,2].plot(t_bounds, collapse_bound, color='#ec4899', lw=2, label='Collapsing')
axes[1,2].plot(t_bounds, -collapse_bound, color='#ec4899', lw=2)
axes[1,2].set_xlabel('Time (s)')
axes[1,2].set_ylabel('Decision boundary')
axes[1,2].set_title('Fixed vs Collapsing Bounds')
axes[1,2].legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')

plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a0a')
plt.close()
print()
print("[Plot saved: DDM trajectories, speed-accuracy tradeoff, accumulators,")
print(" RL learning, RPE distribution, and collapsing bounds]")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Chapter Summary

• Drift-diffusion model: decisions as noisy evidence accumulation to a threshold, with accuracy $P(A) = 1/(1 + e^{-2\mu a/\sigma^2})$.
• Speed-accuracy tradeoff: the threshold $a$ controls the balance; optimal threshold maximizes reward rate.
• Neural implementation: competing accumulators with mutual inhibition implement the DDM through slow integration dynamics.
• Dopamine and RPEs: phasic dopamine encodes $\delta = r + \gamma V(s') - V(s)$, driving value-based learning.
• Collapsing bounds: time-dependent urgency signals optimize decisions when deliberation is costly.

Share:X Reddit LinkedIn

← Learning & Memory Attention & Consciousness →