Part III: Computational Neuroscience | Chapter 2

Neural Networks in Neuroscience

Perceptrons, Hopfield networks, Boltzmann machines, and predictive coding

Brain-Inspired Computation

Neural network models bridge neuroscience and artificial intelligence. From Rosenblatt's perceptron to modern deep learning, these models are both tools for understanding the brain and engineering artifacts inspired by it. This chapter focuses on network architectures with deep neuroscience roots: associative memories (Hopfield networks), stochastic networks (Boltzmann machines), and the predictive coding framework that has become central to theories of cortical computation.

Each model captures a different aspect of neural computation: pattern completion, probabilistic inference, unsupervised feature learning, and hierarchical prediction. Together, they provide a computational language for understanding how the brain represents and processes information.

1. Perceptrons and Linear Classifiers

The perceptron (Rosenblatt, 1958) is the simplest neural network: a single layer of input-to-output connections with a threshold nonlinearity. Despite its simplicity, the perceptron learning theorem guarantees convergence for linearly separable problems, and the model captures essential features of feedforward processing in sensory cortex.

Derivation 1: Perceptron Convergence and Capacity

A perceptron with weights $\mathbf{w}$ classifies input $\mathbf{x}$ as:

$$y = \text{sign}(\mathbf{w} \cdot \mathbf{x})$$

The learning rule updates weights on misclassified patterns:$\mathbf{w} \leftarrow \mathbf{w} + \eta \, t^{(\mu)} \mathbf{x}^{(\mu)}$where $t^{(\mu)}$ is the target label. The convergence theorem guarantees convergence in at most $R^2 / \gamma^2$ steps where $R = \max |\mathbf{x}^{(\mu)}|$and $\gamma$ is the margin. The storage capacity (Cover, 1965) for random patterns is:

$$P_{\max} = 2N$$

where $N$ is the input dimension. This result follows from the VC dimension of linear classifiers. Neuroscience relevance: the cerebellar Purkinje cell receives from ~200,000 parallel fibers and learns binary classifications (LTD vs. no-LTD), functioning as a biological perceptron with very high capacity.

2. Hopfield Networks and Associative Memory

Hopfield networks (1982) model content-addressable (associative) memory, where a stored pattern can be retrieved from a partial or noisy cue. The network's dynamics minimize an energy function, with stored memories corresponding to energy minima (attractors).

Derivation 2: Energy Function and Convergence Proof

The Hopfield energy function is:

$$E = -\frac{1}{2}\sum_{i \neq j} w_{ij} s_i s_j - \sum_i \theta_i s_i$$

With asynchronous updates $s_i \leftarrow \text{sign}(\sum_j w_{ij} s_j + \theta_i)$and symmetric weights $w_{ij} = w_{ji}$, the energy change when neuron $i$ flips is:

$$\Delta E_i = -\Delta s_i \left(\sum_j w_{ij} s_j + \theta_i\right) \leq 0$$

This is negative or zero because $\Delta s_i$ has the same sign as the local field$h_i = \sum_j w_{ij} s_j + \theta_i$. Since the energy is bounded below and decreases monotonically, the network converges to a fixed point. With Hebbian weights$w_{ij} = \frac{1}{N}\sum_\mu \xi_i^\mu \xi_j^\mu$, stored patterns are local energy minima, and the basin of attraction has radius ~0.5 (50% corruption can be corrected).

Modern Hopfield networks (Ramsauer et al., 2021) replace the quadratic energy with an exponential interaction: $E = -\text{lse}(\beta \mathbf{W}^T \mathbf{s})$, achieving exponential capacity $P \sim e^{N/2}$ and connecting to transformer attention.

3. Boltzmann Machines

Boltzmann machines (Hinton and Sejnowski, 1983) extend Hopfield networks with stochastic dynamics and hidden units. They can learn probability distributions over visible patterns, performing unsupervised learning of data statistics — a capability linked to cortical generative models.

Derivation 3: Boltzmann Learning Rule

The probability of a state $\mathbf{s}$ follows the Boltzmann distribution:

$$P(\mathbf{s}) = \frac{1}{Z} \exp\left(-E(\mathbf{s}) / T\right)$$

For visible units $\mathbf{v}$ and hidden units $\mathbf{h}$, the log-likelihood of data is maximized by the gradient:

$$\Delta w_{ij} = \eta \left(\langle s_i s_j \rangle_{\text{data}} - \langle s_i s_j \rangle_{\text{model}}\right)$$

The "data" phase clamps visible units and samples hidden units (wake phase), while the "model" phase runs the network freely (sleep phase). This wake-sleep algorithm has neuroscience parallels: cortical activity during waking (data-driven) and during REM sleep (generative replay).

In Restricted Boltzmann Machines (RBMs), no intra-layer connections exist, making the conditional distributions factorize: $P(h_j = 1 | \mathbf{v}) = \sigma(\sum_i w_{ij} v_i + b_j)$. This enables efficient contrastive divergence training (Hinton, 2002).

4. Predictive Coding

Predictive coding (Rao and Ballard, 1999; Friston, 2005) proposes that the cortex implements a hierarchical generative model, continuously predicting its inputs and updating predictions based on prediction errors. This framework unifies perception, attention, and learning under a single computational principle.

Derivation 4: Hierarchical Predictive Coding

In a two-level predictive coding hierarchy, level 2 generates predictions for level 1. The prediction error at level 1 is:

$$\epsilon_1 = \mathbf{x}_1 - f(\boldsymbol{\mu}_2)$$

where $\mathbf{x}_1$ is the input, $\boldsymbol{\mu}_2$ is the level-2 representation, and $f$ is the generative mapping. The free energy (variational bound) is:

$$F = \frac{1}{2}\left[\boldsymbol{\epsilon}_1^T \boldsymbol{\Sigma}_1^{-1} \boldsymbol{\epsilon}_1 + \boldsymbol{\epsilon}_2^T \boldsymbol{\Sigma}_2^{-1} \boldsymbol{\epsilon}_2 + \ln|\boldsymbol{\Sigma}_1| + \ln|\boldsymbol{\Sigma}_2|\right]$$

The representations update by gradient descent on free energy:

$$\dot{\boldsymbol{\mu}}_2 = -\frac{\partial F}{\partial \boldsymbol{\mu}_2} = \mathbf{D}_f^T \boldsymbol{\Sigma}_1^{-1} \boldsymbol{\epsilon}_1 - \boldsymbol{\Sigma}_2^{-1} \boldsymbol{\epsilon}_2$$

where $\mathbf{D}_f = \partial f / \partial \boldsymbol{\mu}_2$ is the Jacobian. This maps onto cortical microcircuits: superficial layers signal prediction errors (forward), deep layers signal predictions (backward), and precision-weighting ($\boldsymbol{\Sigma}^{-1}$) implements attention. The scheme explains extra-classical receptive field effects, mismatch negativity, and repetition suppression.

Derivation 5: Free Energy Principle and Active Inference

Friston's free energy principle (2006) generalizes predictive coding to action. An agent minimizes variational free energy $F$ through both perception (updating beliefs $\boldsymbol{\mu}$) and action (changing sensory input $\mathbf{x}$):

$$F = D_{\text{KL}}[q(\theta) \| p(\theta | \mathbf{x})] - \ln p(\mathbf{x})$$

Since $D_{\text{KL}} \geq 0$, minimizing $F$ maximizes model evidence $\ln p(\mathbf{x})$. Action $a$ minimizes expected free energy:

$$G(a) = \underbrace{D_{\text{KL}}[q(\mathbf{x}|a) \| p(\mathbf{x})]}_{\text{pragmatic (goal)}} + \underbrace{H[q(\mathbf{x}|a)]}_{\text{epistemic (info gain)}}$$

This decomposes into exploitation (achieving preferred outcomes) and exploration (reducing uncertainty), providing a principled account of curiosity-driven behavior. Active inference has been applied to motor control, decision-making, and psychiatric disorders (autism as altered precision, schizophrenia as aberrant prediction errors).

5. Historical Development

• 1943: McCulloch and Pitts propose the first mathematical neuron model, showing that networks of threshold units can compute any logical function.
• 1958: Rosenblatt introduces the perceptron and proves the convergence theorem.
• 1982: Hopfield introduces the energy-based associative memory network, connecting neural networks to statistical physics.
• 1983: Hinton and Sejnowski introduce Boltzmann machines, combining neural networks with probabilistic inference.
• 1995: Dayan et al. develop the Helmholtz machine with wake-sleep learning, a precursor to variational autoencoders.
• 1999: Rao and Ballard propose predictive coding as a model for cortical visual processing.
• 2006: Friston formulates the free energy principle as a unifying theory of brain function.
• 2021: Ramsauer et al. connect modern Hopfield networks to transformer attention, bridging neuroscience and deep learning.

6. Applications

Computational Psychiatry

Predictive coding models formalize psychiatric disorders as aberrant inference: schizophrenia as weakened priors, autism as excessive precision on sensory prediction errors, and depression as pessimistic generative models.

Generative AI

Boltzmann machines inspired variational autoencoders and diffusion models. The wake-sleep algorithm prefigured modern generative adversarial networks. Brain-inspired architectures continue to inform AI design.

Memory Prosthetics

Hopfield network theory guides the design of associative memory prosthetics and content-addressable storage systems. Understanding memory capacity limits informs the number of memories a prosthetic can support.

Sensory Neuroprosthetics

Predictive coding principles improve neural decoding by accounting for top-down predictions. Active inference frameworks enable closed-loop prosthetics that simultaneously decode intention and update sensory feedback.

7. Computational Exploration

Neural Networks: Perceptron, Hopfield, Boltzmann Machines, and Predictive Coding

Python

script.py322 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

print("=" * 72)
print("NEURAL NETWORKS: PERCEPTRON, HOPFIELD, BOLTZMANN, PREDICTIVE CODING")
print("=" * 72)

np.random.seed(42)

# --------------------------------------------------
# 1. PERCEPTRON LEARNING
# --------------------------------------------------
print()
print("1. PERCEPTRON LEARNING")
print("-" * 55)

N_dim = 20
P_train = 30  # below capacity 2N = 40
X_perc = np.random.randn(P_train, N_dim)
y_perc = np.sign(np.random.randn(P_train))

w_perc = np.zeros(N_dim)
eta_perc = 0.1
errors_perc = []

for epoch in range(100):
    n_errors = 0
    for mu in range(P_train):
        y_pred = np.sign(np.dot(w_perc, X_perc[mu]))
        if y_pred == 0:
            y_pred = 1
        if y_pred != y_perc[mu]:
            w_perc += eta_perc * y_perc[mu] * X_perc[mu]
            n_errors += 1
    errors_perc.append(n_errors)
    if n_errors == 0:
        print("  Converged at epoch {}".format(epoch))
        break

print("  N = {}, P = {} (alpha = {:.2f})".format(N_dim, P_train, P_train / N_dim))
print("  Final errors: {}".format(errors_perc[-1]))

# Test capacity
capacities = []
for P_test in range(5, 80, 5):
    n_success = 0
    for trial in range(10):
        X_t = np.random.randn(P_test, N_dim)
        y_t = np.sign(np.random.randn(P_test))
        w_t = np.zeros(N_dim)
        converged = False
        for ep in range(200):
            n_err = 0
            for mu in range(P_test):
                yp = np.sign(np.dot(w_t, X_t[mu]))
                if yp == 0: yp = 1
                if yp != y_t[mu]:
                    w_t += 0.1 * y_t[mu] * X_t[mu]
                    n_err += 1
            if n_err == 0:
                converged = True
                break
        if converged:
            n_success += 1
    capacities.append((P_test / N_dim, n_success / 10))

print("  Capacity test (fraction converged):")
for alpha, frac in capacities:
    print("    alpha = {:.2f}: {:.0f}%".format(alpha, frac * 100))

# --------------------------------------------------
# 2. HOPFIELD NETWORK PATTERN RETRIEVAL
# --------------------------------------------------
print()
print("2. HOPFIELD NETWORK: PATTERN RETRIEVAL")
print("-" * 55)

N_hop = 100
P_hop = 8
patterns = np.sign(np.random.randn(P_hop, N_hop))

W_hop = np.zeros((N_hop, N_hop))
for mu in range(P_hop):
    W_hop += np.outer(patterns[mu], patterns[mu]) / N_hop
np.fill_diagonal(W_hop, 0)

# Test retrieval at different corruption levels
corruption_levels = [0.05, 0.1, 0.2, 0.3, 0.4, 0.5]
retrieval_results = []

for corr in corruption_levels:
    overlaps = []
    for mu in range(P_hop):
        state = patterns[mu].copy()
        flip = np.random.choice(N_hop, int(corr * N_hop), replace=False)
        state[flip] *= -1

for _ in range(50):
            for i in np.random.permutation(N_hop):
                h = np.dot(W_hop[i], state)
                state[i] = 1 if h >= 0 else -1

overlap = np.abs(np.dot(state, patterns[mu])) / N_hop
        overlaps.append(overlap)
    retrieval_results.append(np.mean(overlaps))
    print("  Corruption {:.0f}%: mean overlap = {:.3f}".format(corr*100, np.mean(overlaps)))

# --------------------------------------------------
# 3. RESTRICTED BOLTZMANN MACHINE
# --------------------------------------------------
print()
print("3. RESTRICTED BOLTZMANN MACHINE (RBM)")
print("-" * 55)

N_vis = 20
N_hid = 10
n_data = 200
eta_rbm = 0.01
n_epochs_rbm = 100

# Generate structured data (3 clusters)
data = np.zeros((n_data, N_vis))
for i in range(n_data):
    cluster = i % 3
    if cluster == 0:
        data[i, :7] = 1
    elif cluster == 1:
        data[i, 7:14] = 1
    else:
        data[i, 14:] = 1
    data[i] += 0.1 * np.random.randn(N_vis)
    data[i] = (data[i] > 0.5).astype(float)

W_rbm = 0.01 * np.random.randn(N_vis, N_hid)
b_vis = np.zeros(N_vis)
b_hid = np.zeros(N_hid)

def sigmoid_rbm(x):
    return 1.0 / (1.0 + np.exp(-np.clip(x, -20, 20)))

recon_errors = []
for epoch in range(n_epochs_rbm):
    total_err = 0
    for v0 in data:
        # Positive phase
        h_prob = sigmoid_rbm(v0.dot(W_rbm) + b_hid)
        h_sample = (np.random.random(N_hid) < h_prob).astype(float)

# Negative phase (CD-1)
        v_recon = sigmoid_rbm(h_sample.dot(W_rbm.T) + b_vis)
        h_recon = sigmoid_rbm(v_recon.dot(W_rbm) + b_hid)

# Update
        W_rbm += eta_rbm * (np.outer(v0, h_prob) - np.outer(v_recon, h_recon))
        b_vis += eta_rbm * (v0 - v_recon)
        b_hid += eta_rbm * (h_prob - h_recon)

total_err += np.sum((v0 - v_recon)**2)
    recon_errors.append(total_err / n_data)

print("  Visible: {}, Hidden: {}".format(N_vis, N_hid))
print("  Initial recon error: {:.4f}".format(recon_errors[0]))
print("  Final recon error: {:.4f}".format(recon_errors[-1]))

# --------------------------------------------------
# 4. PREDICTIVE CODING (2-LEVEL)
# --------------------------------------------------
print()
print("4. PREDICTIVE CODING (Hierarchical)")
print("-" * 55)

n_steps_pc = 200
lr_pc = 0.1
sigma1_sq = 0.5
sigma2_sq = 1.0

# True generative model: x = W * mu2 + noise
N1 = 10; N2 = 3
W_gen = np.random.randn(N1, N2) * 0.5
mu2_true = np.array([1.0, -0.5, 0.8])
x_obs = W_gen.dot(mu2_true) + np.random.randn(N1) * np.sqrt(sigma1_sq)

# Inference: recover mu2 from x_obs
mu2_est = np.zeros(N2)
prior_mu2 = np.zeros(N2)
pe_history = []

for step in range(n_steps_pc):
    # Prediction error at level 1
    prediction = W_gen.dot(mu2_est)
    epsilon1 = x_obs - prediction

# Prediction error at level 2 (prior)
    epsilon2 = mu2_est - prior_mu2

# Update
    grad = W_gen.T.dot(epsilon1) / sigma1_sq - epsilon2 / sigma2_sq
    mu2_est += lr_pc * grad

pe_history.append(np.sum(epsilon1**2))

correlation = np.corrcoef(mu2_true, mu2_est)[0, 1]
print("  True mu2: {}".format(np.round(mu2_true, 3)))
print("  Estimated mu2: {}".format(np.round(mu2_est, 3)))
print("  Correlation: {:.4f}".format(correlation))
print("  Final prediction error: {:.4f}".format(pe_history[-1]))

# --------------------------------------------------
# 5. FREE ENERGY MINIMIZATION
# --------------------------------------------------
print()
print("5. FREE ENERGY MINIMIZATION (Active Inference)")
print("-" * 55)

# Simple agent minimizing free energy
n_steps_fe = 100
mu_agent = 0.0  # belief about hidden state
x_sensory = 3.0  # observed sensory input
sigma_s = 0.5  # sensory precision
sigma_p = 2.0  # prior precision

fe_history = []
mu_history = []

for step in range(n_steps_fe):
    # Prediction error
    eps_s = x_sensory - mu_agent  # sensory PE
    eps_p = mu_agent - 0.0  # prior PE (prior mean = 0)

# Free energy
    fe = 0.5 * (eps_s**2 / sigma_s + eps_p**2 / sigma_p)
    fe_history.append(fe)
    mu_history.append(mu_agent)

# Gradient descent on free energy
    dmu = eps_s / sigma_s - eps_p / sigma_p
    mu_agent += 0.1 * dmu

# Optimal belief (precision-weighted average)
mu_optimal = (x_sensory / sigma_s + 0.0 / sigma_p) / (1/sigma_s + 1/sigma_p)
print("  Sensory input: {:.1f}, Prior mean: 0.0".format(x_sensory))
print("  Sensory precision: {:.1f}, Prior precision: {:.1f}".format(1/sigma_s, 1/sigma_p))
print("  Final belief: {:.4f}".format(mu_agent))
print("  Optimal (Bayesian): {:.4f}".format(mu_optimal))
print("  Initial free energy: {:.4f}".format(fe_history[0]))
print("  Final free energy: {:.4f}".format(fe_history[-1]))

# --------------------------------------------------
# PLOTTING
# --------------------------------------------------
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle("Neural Networks: Perceptron, Hopfield, RBM, and Predictive Coding",
             fontsize=14, fontweight='bold', color='white')
fig.patch.set_facecolor('#0a0a0a')

for ax in axes.flat:
    ax.set_facecolor('#111111')
    ax.tick_params(colors='white', labelsize=8)
    ax.xaxis.label.set_color('white')
    ax.yaxis.label.set_color('white')
    ax.title.set_color('#f0abfc')
    for spine in ax.spines.values():
        spine.set_color('#333333')

# Panel 1: Perceptron capacity
alphas_cap = [c[0] for c in capacities]
fracs_cap = [c[1] for c in capacities]
axes[0,0].plot(alphas_cap, fracs_cap, 'o-', color='#ec4899', lw=2, markersize=6)
axes[0,0].axvline(2.0, color='#fbbf24', linestyle='--', alpha=0.7, label='Capacity limit (2N)')
axes[0,0].set_xlabel('alpha = P/N')
axes[0,0].set_ylabel('Fraction converged')
axes[0,0].set_title('Perceptron Capacity')
axes[0,0].legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')

# Panel 2: Hopfield retrieval
axes[0,1].plot([c*100 for c in corruption_levels], retrieval_results, 'o-',
               color='#d946ef', lw=2, markersize=8)
axes[0,1].axhline(1.0, color='#fbbf24', linestyle='--', alpha=0.5, label='Perfect')
axes[0,1].set_xlabel('Corruption (%)')
axes[0,1].set_ylabel('Mean overlap')
axes[0,1].set_title('Hopfield Pattern Retrieval (P={})'.format(P_hop))
axes[0,1].legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')
axes[0,1].set_ylim(0, 1.05)

# Panel 3: RBM reconstruction error
axes[0,2].semilogy(recon_errors, color='#ec4899', lw=2)
axes[0,2].set_xlabel('Epoch')
axes[0,2].set_ylabel('Reconstruction error')
axes[0,2].set_title('RBM Learning (CD-1)')

# Panel 4: Predictive coding convergence
axes[1,0].semilogy(pe_history, color='#d946ef', lw=2)
axes[1,0].set_xlabel('Inference step')
axes[1,0].set_ylabel('Prediction error (sum sq)')
axes[1,0].set_title('Predictive Coding Inference')

# Panel 5: Free energy minimization
axes[1,1].plot(fe_history, color='#ec4899', lw=2, label='Free energy')
axes[1,1].set_xlabel('Step')
axes[1,1].set_ylabel('Free energy')
axes[1,1].set_title('Free Energy Minimization')
axes[1,1].legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')

# Panel 6: Belief update trajectory
axes[1,2].plot(mu_history, color='#d946ef', lw=2, label='Belief')
axes[1,2].axhline(x_sensory, color='#ec4899', linestyle='--', alpha=0.5, label='Sensory input')
axes[1,2].axhline(0, color='#60a5fa', linestyle='--', alpha=0.5, label='Prior')
axes[1,2].axhline(mu_optimal, color='#fbbf24', linestyle=':', alpha=0.7, label='Bayesian optimal')
axes[1,2].set_xlabel('Step')
axes[1,2].set_ylabel('Belief (mu)')
axes[1,2].set_title('Active Inference: Belief Update')
axes[1,2].legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')

plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a0a')
plt.close()
print()
print("[Plot saved: perceptron capacity, Hopfield retrieval, RBM learning,")
print(" predictive coding, free energy, and belief update]")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Chapter Summary

• Perceptrons learn linear classifications with capacity $P_{\max} = 2N$; the cerebellar Purkinje cell functions as a biological perceptron.
• Hopfield networks store associative memories as energy minima with capacity $P \approx 0.138N$ and content-addressable retrieval.
• Boltzmann machines learn probability distributions via wake-sleep dynamics, paralleling cortical generative models.
• Predictive coding: the cortex minimizes prediction errors across a hierarchy, unifying perception, attention, and learning.
• Free energy principle: agents minimize variational free energy through both perception and action (active inference).

Share:X Reddit LinkedIn

← Computational Models Brain Imaging →