Module 0

Bayes & Beliefs

The Bayesian-brain hypothesis starts from a simple premise: to interact with the world, a brain must infer hidden causes from noisy observations. This module introduces Bayes’ theorem, the frequentist-Bayesian divide, generative vs. discriminative models, and the idea of a belief as a probability distribution over states of the world — the conceptual scaffolding for modules 1–8.

1. Bayes’ Theorem

For hidden cause H and observation D:

\[ P(H\mid D) \;=\; \frac{P(D\mid H)\,P(H)}{P(D)},\qquad P(D) = \int P(D\mid H')P(H')\,dH' \]

Four standard terms: prior P(H) is what the agent believed before the observation; the likelihoodP(D|H) is how probable the observation is given each hypothesis; the evidence P(D) is a normaliser; the posterior P(H|D) is the updated belief. Sequential observations compound by iterative update: yesterday’s posterior is today’s prior.

2. Beliefs as Distributions

Unlike the binary “I believe X” of everyday talk, Bayesian beliefs are continuous distributions: P(θ) assigns probability mass to each possible value of the world parameter. The mean tells you best guess; the variance tells you confidence. Updating a belief is updating a distribution, not flipping a boolean. This is the essential conceptual move of the Bayesian brain.

3. Generative vs. Discriminative

A generative model represents P(D, H) — how observations arise from causes. A discriminative model represents only P(H|D) — the conditional classifier. Generative models can reason in reverse, produce data, reason about counterfactuals, and support active exploration (M5). Deep learning defaulted to discriminative models until the diffusion/generative-AI turn of the 2020s; the Bayesian brain has been generative throughout.

Simulation: Belief Update & Conjugate Priors

Python

script.py61 lines

import numpy as np, matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# Classic medical-test problem and belief update
# P(disease) = 0.01, test sensitivity 0.95, specificity 0.95
p_d = 0.01
sens = 0.95
spec = 0.95

# Posterior after positive test
p_pos_given_d = sens
p_pos_given_nd = 1 - spec
p_pos = p_pos_given_d*p_d + p_pos_given_nd*(1-p_d)
p_d_given_pos = p_pos_given_d * p_d / p_pos

# Sequential updating with repeated positive tests
tests = np.arange(0, 6)
p_post = [p_d]
for _ in tests[1:]:
    prior = p_post[-1]
    p_pos = sens*prior + (1-spec)*(1-prior)
    p_post.append(sens*prior / p_pos)

# Beta prior -> Beta posterior (Bernoulli likelihood)
x = np.linspace(0, 1, 300)
from scipy.special import beta as beta_fn
priors = [(2, 8), (1, 1), (10, 10)]
obs = (7, 3)   # 7 successes, 3 failures

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5), facecolor='#0a0a1a')
for ax in (ax1, ax2):
    ax.set_facecolor('#111827'); ax.tick_params(colors='#cbd5e1')
    for s in ax.spines.values(): s.set_color('#334155')
    ax.grid(True, color='#334155', alpha=0.3)

ax1.plot(tests, p_post, color='#a78bfa', lw=2.6, marker='o')
ax1.set_xlabel('Positive tests observed', color='#cbd5e1')
ax1.set_ylabel('P(disease)', color='#cbd5e1')
ax1.set_title(f'Base rate 1%, sens 95%, spec 95%',
              color='#c4b5fd', fontweight='bold')
ax1.annotate(f'After 1 test: {p_post[1]:.2f}', (1, p_post[1]),
             xytext=(10, 8), textcoords='offset points', color='#c4b5fd')

for (a, b), col in zip(priors, ['#38bdf8', '#fbbf24', '#f87171']):
    prior = x**(a-1) * (1-x)**(b-1) / beta_fn(a, b)
    ap = a + obs[0]; bp = b + obs[1]
    post = x**(ap-1) * (1-x)**(bp-1) / beta_fn(ap, bp)
    ax2.plot(x, prior/prior.max(), color=col, ls='--', alpha=0.5)
    ax2.plot(x, post/post.max(), color=col, lw=2.4,
             label=f'Beta({a},{b}) -> Beta({ap},{bp})')
ax2.set_xlabel('theta', color='#cbd5e1')
ax2.set_ylabel('Normalised density', color='#cbd5e1')
ax2.set_title('Beta-Bernoulli conjugate update',
              color='#c4b5fd', fontweight='bold')
ax2.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='#cbd5e1')

plt.tight_layout()
plt.savefig('output.png', dpi=120, bbox_inches='tight', facecolor='#0a0a1a')
print(f'After 1 positive test: P(disease)={p_post[1]:.3f}')
print(f'After 3 positive tests: P(disease)={p_post[3]:.3f}')

Click Run to execute the Python code

Code will be executed with Python 3 on the server

4. Frequentist vs. Bayesian

Frequentist statistics treats parameters as unknown but fixed; probability is the long-run frequency of an event. Bayesian statistics treats parameters as random variables with subjective probability distributions. For the Bayesian brain hypothesis, the frequentist framework is simply unusable — single-trial cognitive judgements cannot be interpreted as “repeated sampling.” Bayesian inference is the natural computational substrate of belief.

Key References

• Knill, D. C. & Pouget, A. (2004). “The Bayesian brain: the role of uncertainty in neural coding and computation.” Trends Neurosci., 27, 712–719.

• MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge UP.

• Gelman, A. et al. (2013). Bayesian Data Analysis, 3rd ed. CRC Press.

• Doya, K., Ishii, S., Pouget, A. & Rao, R. P. N. (2007). Bayesian Brain: Probabilistic Approaches to Neural Coding. MIT Press.

Share:X Reddit LinkedIn

← Overview Module 1: Generative Models →