Part III: Computational Neuroscience | Chapter 2

Neural Networks in Neuroscience

Perceptrons, Hopfield networks, Boltzmann machines, and predictive coding

Brain-Inspired Computation

Neural network models bridge neuroscience and artificial intelligence. From Rosenblatt's perceptron to modern deep learning, these models are both tools for understanding the brain and engineering artifacts inspired by it. This chapter focuses on network architectures with deep neuroscience roots: associative memories (Hopfield networks), stochastic networks (Boltzmann machines), and the predictive coding framework that has become central to theories of cortical computation.

Each model captures a different aspect of neural computation: pattern completion, probabilistic inference, unsupervised feature learning, and hierarchical prediction. Together, they provide a computational language for understanding how the brain represents and processes information.

1. Perceptrons and Linear Classifiers

The perceptron (Rosenblatt, 1958) is the simplest neural network: a single layer of input-to-output connections with a threshold nonlinearity. Despite its simplicity, the perceptron learning theorem guarantees convergence for linearly separable problems, and the model captures essential features of feedforward processing in sensory cortex.

Derivation 1: Perceptron Convergence and Capacity

A perceptron with weights $\mathbf{w}$ classifies input $\mathbf{x}$ as:

$$y = \text{sign}(\mathbf{w} \cdot \mathbf{x})$$

The learning rule updates weights on misclassified patterns:$\mathbf{w} \leftarrow \mathbf{w} + \eta \, t^{(\mu)} \mathbf{x}^{(\mu)}$where $t^{(\mu)}$ is the target label. The convergence theorem guarantees convergence in at most $R^2 / \gamma^2$ steps where $R = \max |\mathbf{x}^{(\mu)}|$and $\gamma$ is the margin. The storage capacity (Cover, 1965) for random patterns is:

$$P_{\max} = 2N$$

where $N$ is the input dimension. This result follows from the VC dimension of linear classifiers. Neuroscience relevance: the cerebellar Purkinje cell receives from ~200,000 parallel fibers and learns binary classifications (LTD vs. no-LTD), functioning as a biological perceptron with very high capacity.

2. Hopfield Networks and Associative Memory

Hopfield networks (1982) model content-addressable (associative) memory, where a stored pattern can be retrieved from a partial or noisy cue. The network's dynamics minimize an energy function, with stored memories corresponding to energy minima (attractors).

Derivation 2: Energy Function and Convergence Proof

The Hopfield energy function is:

$$E = -\frac{1}{2}\sum_{i \neq j} w_{ij} s_i s_j - \sum_i \theta_i s_i$$

With asynchronous updates $s_i \leftarrow \text{sign}(\sum_j w_{ij} s_j + \theta_i)$and symmetric weights $w_{ij} = w_{ji}$, the energy change when neuron $i$ flips is:

$$\Delta E_i = -\Delta s_i \left(\sum_j w_{ij} s_j + \theta_i\right) \leq 0$$

This is negative or zero because $\Delta s_i$ has the same sign as the local field$h_i = \sum_j w_{ij} s_j + \theta_i$. Since the energy is bounded below and decreases monotonically, the network converges to a fixed point. With Hebbian weights$w_{ij} = \frac{1}{N}\sum_\mu \xi_i^\mu \xi_j^\mu$, stored patterns are local energy minima, and the basin of attraction has radius ~0.5 (50% corruption can be corrected).

Modern Hopfield networks (Ramsauer et al., 2021) replace the quadratic energy with an exponential interaction: $E = -\text{lse}(\beta \mathbf{W}^T \mathbf{s})$, achieving exponential capacity $P \sim e^{N/2}$ and connecting to transformer attention.

3. Boltzmann Machines

Boltzmann machines (Hinton and Sejnowski, 1983) extend Hopfield networks with stochastic dynamics and hidden units. They can learn probability distributions over visible patterns, performing unsupervised learning of data statistics — a capability linked to cortical generative models.

Derivation 3: Boltzmann Learning Rule

The probability of a state $\mathbf{s}$ follows the Boltzmann distribution:

$$P(\mathbf{s}) = \frac{1}{Z} \exp\left(-E(\mathbf{s}) / T\right)$$

For visible units $\mathbf{v}$ and hidden units $\mathbf{h}$, the log-likelihood of data is maximized by the gradient:

$$\Delta w_{ij} = \eta \left(\langle s_i s_j \rangle_{\text{data}} - \langle s_i s_j \rangle_{\text{model}}\right)$$

The "data" phase clamps visible units and samples hidden units (wake phase), while the "model" phase runs the network freely (sleep phase). This wake-sleep algorithm has neuroscience parallels: cortical activity during waking (data-driven) and during REM sleep (generative replay).

In Restricted Boltzmann Machines (RBMs), no intra-layer connections exist, making the conditional distributions factorize: $P(h_j = 1 | \mathbf{v}) = \sigma(\sum_i w_{ij} v_i + b_j)$. This enables efficient contrastive divergence training (Hinton, 2002).

4. Predictive Coding

Predictive coding (Rao and Ballard, 1999; Friston, 2005) proposes that the cortex implements a hierarchical generative model, continuously predicting its inputs and updating predictions based on prediction errors. This framework unifies perception, attention, and learning under a single computational principle.

Derivation 4: Hierarchical Predictive Coding

In a two-level predictive coding hierarchy, level 2 generates predictions for level 1. The prediction error at level 1 is:

$$\epsilon_1 = \mathbf{x}_1 - f(\boldsymbol{\mu}_2)$$

where $\mathbf{x}_1$ is the input, $\boldsymbol{\mu}_2$ is the level-2 representation, and $f$ is the generative mapping. The free energy (variational bound) is:

$$F = \frac{1}{2}\left[\boldsymbol{\epsilon}_1^T \boldsymbol{\Sigma}_1^{-1} \boldsymbol{\epsilon}_1 + \boldsymbol{\epsilon}_2^T \boldsymbol{\Sigma}_2^{-1} \boldsymbol{\epsilon}_2 + \ln|\boldsymbol{\Sigma}_1| + \ln|\boldsymbol{\Sigma}_2|\right]$$

The representations update by gradient descent on free energy:

$$\dot{\boldsymbol{\mu}}_2 = -\frac{\partial F}{\partial \boldsymbol{\mu}_2} = \mathbf{D}_f^T \boldsymbol{\Sigma}_1^{-1} \boldsymbol{\epsilon}_1 - \boldsymbol{\Sigma}_2^{-1} \boldsymbol{\epsilon}_2$$

where $\mathbf{D}_f = \partial f / \partial \boldsymbol{\mu}_2$ is the Jacobian. This maps onto cortical microcircuits: superficial layers signal prediction errors (forward), deep layers signal predictions (backward), and precision-weighting ($\boldsymbol{\Sigma}^{-1}$) implements attention. The scheme explains extra-classical receptive field effects, mismatch negativity, and repetition suppression.

Derivation 5: Free Energy Principle and Active Inference

Friston's free energy principle (2006) generalizes predictive coding to action. An agent minimizes variational free energy $F$ through both perception (updating beliefs $\boldsymbol{\mu}$) and action (changing sensory input $\mathbf{x}$):

$$F = D_{\text{KL}}[q(\theta) \| p(\theta | \mathbf{x})] - \ln p(\mathbf{x})$$

Since $D_{\text{KL}} \geq 0$, minimizing $F$ maximizes model evidence $\ln p(\mathbf{x})$. Action $a$ minimizes expected free energy:

$$G(a) = \underbrace{D_{\text{KL}}[q(\mathbf{x}|a) \| p(\mathbf{x})]}_{\text{pragmatic (goal)}} + \underbrace{H[q(\mathbf{x}|a)]}_{\text{epistemic (info gain)}}$$

This decomposes into exploitation (achieving preferred outcomes) and exploration (reducing uncertainty), providing a principled account of curiosity-driven behavior. Active inference has been applied to motor control, decision-making, and psychiatric disorders (autism as altered precision, schizophrenia as aberrant prediction errors).

5. Historical Development

  • 1943: McCulloch and Pitts propose the first mathematical neuron model, showing that networks of threshold units can compute any logical function.
  • 1958: Rosenblatt introduces the perceptron and proves the convergence theorem.
  • 1982: Hopfield introduces the energy-based associative memory network, connecting neural networks to statistical physics.
  • 1983: Hinton and Sejnowski introduce Boltzmann machines, combining neural networks with probabilistic inference.
  • 1995: Dayan et al. develop the Helmholtz machine with wake-sleep learning, a precursor to variational autoencoders.
  • 1999: Rao and Ballard propose predictive coding as a model for cortical visual processing.
  • 2006: Friston formulates the free energy principle as a unifying theory of brain function.
  • 2021: Ramsauer et al. connect modern Hopfield networks to transformer attention, bridging neuroscience and deep learning.

6. Applications

Computational Psychiatry

Predictive coding models formalize psychiatric disorders as aberrant inference: schizophrenia as weakened priors, autism as excessive precision on sensory prediction errors, and depression as pessimistic generative models.

Generative AI

Boltzmann machines inspired variational autoencoders and diffusion models. The wake-sleep algorithm prefigured modern generative adversarial networks. Brain-inspired architectures continue to inform AI design.

Memory Prosthetics

Hopfield network theory guides the design of associative memory prosthetics and content-addressable storage systems. Understanding memory capacity limits informs the number of memories a prosthetic can support.

Sensory Neuroprosthetics

Predictive coding principles improve neural decoding by accounting for top-down predictions. Active inference frameworks enable closed-loop prosthetics that simultaneously decode intention and update sensory feedback.

7. Computational Exploration

Neural Networks: Perceptron, Hopfield, Boltzmann Machines, and Predictive Coding

Python
script.py322 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Chapter Summary

  • Perceptrons learn linear classifications with capacity $P_{\max} = 2N$; the cerebellar Purkinje cell functions as a biological perceptron.
  • Hopfield networks store associative memories as energy minima with capacity $P \approx 0.138N$ and content-addressable retrieval.
  • Boltzmann machines learn probability distributions via wake-sleep dynamics, paralleling cortical generative models.
  • Predictive coding: the cortex minimizes prediction errors across a hierarchy, unifying perception, attention, and learning.
  • Free energy principle: agents minimize variational free energy through both perception and action (active inference).