Module 8: Multimodal Integration & Predictive Brains

The peripheral receptors of modules 1–7 deliver streams of heterogeneous signals—photons, sound pressure, vibrations, electric fields, magnetic vectors, chemical identity, temperature. A unified percept emerges only when these streams are integrated. This capstone module derives the Stein–Meredith principles of multisensory enhancement, reviews the architectures of the superior colliculus, optic tectum, and insect mushroom body, grounds perception in the Rao–Ballard predictive-coding framework, and explores the binding problem, cross-modal plasticity, and the engineering analogues in modern multimodal neural networks.

1. The Superior Colliculus / Optic Tectum

The superior colliculus (SC) in mammals and its homologue the optic tectum in non-mammalian vertebrates are the canonical multisensory convergence sites. Superficial layers receive retinal input and encode a topographic map of visual space; deeper layers carry auditory, somatosensory, and (in snakes) thermal maps, all spatially registered with the visual map.

Stein & Meredith (1993) synthesized a decade of electrophysiology in the cat SC to articulate the four principles of multisensory integration, subsequently validated across species:

Spatial rule: stimuli from the same spatial location integrate multiplicatively; spatially disparate stimuli produce sub-additive responses or inhibition.
Temporal rule: stimuli within a temporal window of ∼300 ms integrate; outside this window the responses are independent.
Inverse effectiveness: weaker unimodal inputs produce larger proportional enhancement when combined—multisensory integration matters most near threshold.
Modality matching: integration is strongest when individual modalities alone are ambiguous, implementing a form of Bayesian cue combination.

\[ \text{Enhancement} = \frac{R_{\text{multi}} - R_{\text{max-uni}}}{R_{\text{max-uni}}} \times 100\%\]

Typical SC multisensory neurons exhibit 100–1200% enhancement at low unimodal intensities

The Bayesian interpretation (Ernst & Banks 2002; Alais & Burr 2004) frames multisensory integration as maximum-likelihood estimation with precision-weighted averaging of modality-specific estimates:

\[ \hat{x} = \frac{\sigma_A^{-2} \hat{x}_A + \sigma_V^{-2} \hat{x}_V}{\sigma_A^{-2} + \sigma_V^{-2}}, \qquad \sigma_{\hat{x}}^{-2} = \sigma_A^{-2} + \sigma_V^{-2}\]

the combined estimate has smaller variance than either modality alone

2. Multimodal Integration Across Taxa

Bats: Echolocation + Vision

Bats (Myotis, Pteropus) dominate short-range navigation with echolocation (module 2), but vision provides long-range orientation via landscape silhouettes (Suthers & Wallin 1968). Megachiroptera (fruit bats) rely heavily on vision, while microchiroptera switch modalities depending on task: foraging microbats use primarily sonar, but return to a home roost using vision.

Dolphins: Sonar + Vision + Electroreception

The Guiana dolphin (Sotalia guianensis) integrates at least three modalities: sonar (acoustic), vision (photic), and electroreception (Czech-Damal et al. 2012). Each modality contributes best in different regimes: vision in clear water, sonar in turbidity, electroreception at close range for buried prey.

Crocodiles: Dome Pressure Receptors

Integumentary sensory organs on the jaws of crocodilians contain trigeminal free nerve endings sensitive to both water pressure (mechanoreception) and temperature (thermoreception) (Leitch & Catania 2012). The brain integrates these two modalities with visual and vibrational cues from prey splashes.

Snakes: Thermal + Visual

As discussed in module 7, the snake optic tectum overlays pit-organ thermal input onto retinal visual input through neural circuits that literally align the two topographic maps (Hartline et al. 1978). The snake perceives a fused thermovisual image—a beautiful demonstration of how integration can occur at the topographic-map level well before cortex.

3. Predictive Coding and the Bayesian Brain

Rao & Ballard (1999) formalised cortex as a hierarchical prediction machine. Higher levels generate predictions\(\mu_l = f(r_{l+1})\) of lower-level activity; lower-level neurons compute prediction errors \(\epsilon_l = r_l - \mu_l\) which ascend the hierarchy to update higher-level representations. Only the errors propagate bottom-up, a striking computational efficiency principle that matches the sparse firing patterns observed in real cortex.

\[ \frac{dr_l}{dt} = -\kappa\,r_l + W_{l,l+1}^{\top} \epsilon_l - W_{l-1,l}\, \mu_{l-1}\]

Generalised gradient-descent dynamics minimise expected free energy

Friston (2010) extended predictive coding into the free-energy principle: any self-organising system minimises variational free energy \(F\), an upper bound on the surprise\(-\ln p(\text{sensation})\). Cognitive, physiological, and evolutionary adaptation all reduce to free-energy minimisation across different temporal scales. The framework unifies perception (inference over latent causes), action (active inference over policies), learning (parameter updates), and homeostasis (interoceptive regulation).

Attention as precision weighting: the predictive coding framework naturally incorporates attention as modulation of the precision (inverse variance) of prediction errors. Attending to a stimulus means weighting its prediction errors more heavily in the inference update. This implements Bayesian-optimal cue selection and explains the electrophysiology of attentional modulation of V4 and MT responses (Feldman & Friston 2010).

4. Thalamic Gating and Corticothalamic Loops

Every modality (except olfaction) passes through the thalamus en route to cortex. The specific relay nuclei (LGN for vision, MGN for hearing, VPL/VPM for somatosensation) are not passive relays: corticothalamic feedback from layer VI of cortex is typically an order of magnitude more abundant than thalamocortical inputs to cortex. The reticular thalamic nucleus (TRN) exerts inhibitory gating over all relay nuclei.

Sherman & Guillery (2006) articulated the first-order vs. higher-order thalamus distinction: first-order nuclei (LGN, MGN, VPL) carry peripheral information to cortex; higher-order nuclei (pulvinar, MD) carry cortico-cortical information through the thalamus. Multimodal binding may partially rely on thalamic hub neurons in the pulvinar (Saalmann et al. 2012) that synchronise activity across visual, auditory, and attentional cortical regions.

5. Cross-Modal Plasticity

Congenitally blind individuals show robust auditory and tactile activation of primary visual cortex (V1), as if the “visual” cortex had been repurposed for other modalities (Sadato et al. 1996). Braille reading activates V1 in blind readers, and lesions of occipital cortex impair Braille. The rewiring is mediated by pre-existing but typically weak cross-modal connections that are strengthened in the absence of visual drive (Bavelier & Neville 2002).

Congenitally deaf individuals similarly recruit auditory cortex for visual and tactile processing, and show superior performance in peripheral visual attention and motion detection. These findings suggest that cortical architecture is defined by its input statistics and local circuitry rather than by a modality label, consistent with metamodal theories of cortex.

Sensory substitution devices such as the BrainPort(tongue-array visual display; Bach-y-Rita 1969; commercialised 2015) allow blind users to perceive visual scenes through tactile stimulation, illustrating how cortex can extract meaningful spatial information from any high-bandwidth sensory channel.

6. Non-Mammalian Integrative Centers

Avian Wulst and DVR

Birds lack the six-layered neocortex of mammals but perform equivalent cognitive feats (tool use in corvids, vocal learning in songbirds, mental rotation in pigeons). The avian wulst (dorsal pallium) and dorsal ventricular ridge (DVR) are nuclear-organised structures homologous to specific mammalian cortical regions, as established by Medina & Reiner (2000). Multisensory integration in birds is centred on the nidopallium caudolaterale (NCL), functionally analogous to mammalian prefrontal cortex.

Insect Mushroom Bodies

In insects, the mushroom body (MB) is the primary integration and associative-learning center. Kenyon cells in the MB receive input from olfactory, visual, and mechanosensory projection neurons; their sparse, high-dimensional code supports one-trial associative memory (Caron et al. 2013; Aso et al. 2014). Bouzaiane et al. (2015) mapped individual MBON outputs in Drosophila and showed that distinct MBONs drive approach versus avoidance behavior, with dopamine modulating their synaptic weights.

The MB architecture anticipates several key ideas in modern ML: sparse random projections for dimensionality expansion, Hebbian plasticity at mushroom-body output synapses for learning, and a value-prediction signal carried by dopaminergic neurons directly analogous to temporal-difference reward prediction.

7. Machine-Learning Analogues

The deep-learning revolution of the last decade has produced striking computational analogues to biological multimodal integration. Transformer attention (Vaswani et al. 2017) implements precision-weighted mixing of tokens, mathematically similar to the precision-weighted cue combination of Ernst & Banks (2002). CLIP (Radford et al. 2021) learns joint embeddings of images and text through contrastive training, analogous to cross-modal associative learning in SC/MB. Perceiver IO (Jaegle et al. 2021) processes heterogeneous modalities through cross-attention to a shared latent space, functionally similar to thalamo-cortical hub routing.

Variational autoencoders (Kingma & Welling 2013) minimise a free-energy objective nearly identical in mathematical form to the Friston free-energy principle. Deep predictive coding networks (Lotter et al. 2017) instantiate Rao–Ballard dynamics for video prediction, and their learned features qualitatively resemble responses in mammalian visual cortex.

The convergence between neuroscience and ML is methodological, not accidental: both fields are attempting to solve the same problem of extracting causal structure from high-dimensional noisy sensory data. The comparative-sensory-biology tradition matters because it demonstrates that the solution depends critically on the statistics of the ecologically relevant signal.

8. Interoception and Homeostatic Sensing

Beyond the exteroceptive modalities covered in modules 1–7, the brain integrates a continuous stream of interoceptive signals conveying visceral state: blood pressure (baroreceptors), blood chemistry (chemoreceptors of the carotid body), gut pH and osmolality (vagal afferents), core temperature, and pain. A. D. Craig (2003) argued that interoception is an essential substrate for emotion and selfhood, projecting via lamina I of the spinal cord to the insular cortex in humans.

The vagus nerve carries bidirectional information between brain and viscera, including the gut microbiome: bacteria-produced metabolites (short-chain fatty acids, serotonin) signal through vagal sensory terminals and influence mood and behaviour (Cryan & Dinan 2012). Interoception is thus not only a sensory modality but the neural substrate of the gut–brain axis.

From the free-energy-principle view, interoception and exteroception share the same computational architecture: the brain minimises prediction errors about both internal and external causes. Homeostasis becomes a special case of active inference, with the autonomic nervous system issuing motor commands that keep physiological variables within their predicted envelopes (Seth 2013).

9. The Binding Problem and Consciousness

Crick & Koch (1990) articulated the binding problem: how does the brain combine features encoded by disparate neurons into a unified conscious percept? Candidate solutions include temporal synchrony (40-Hz gamma oscillations; Singer 1993), hierarchical integration (Treisman’s feature-integration theory), and dynamic-core or global-workspace models (Dehaene & Changeux 2011; Tononi’s integrated information theory, 2004).

No single theory has been decisively validated, but all converge on the view that conscious perception correlates with widespread, recurrent, long-range integration across cortical and thalamic networks, not with the activity of any single module. Tononi’s integrated information \(\Phi\) quantifies the information a system generates “above and beyond” its parts—mathematically, the amount of causal irreducibility:

\[ \Phi = \min_{P \in \text{partitions}} D_{\text{KL}}\!\left[p(X) \,\|\, \prod_i p(X_i)\right]\]

High \(\Phi\) systems integrate information in ways that cannot be decomposed into independent parts

10. Brain-Computer Interfaces and Restored Sensation

The mapping between peripheral receptor codes and central percepts is sufficiently understood to enable sensory prostheses. Cochlear implants (Djourno & Eyriès 1957; Simmons 1966; clinical FDA approval 1984) stimulate the auditory nerve directly through an electrode array in the scala tympani, bypassing damaged hair cells. Over 700,000 devices have been implanted globally.

Retinal implants (Argus II; subretinal prostheses) stimulate surviving retinal ganglion cells to restore partial vision in patients with retinitis pigmentosa. Optogenetic prostheses express light-activated channelrhodopsins in surviving retinal neurons, allowing external light to drive the circuit (Sahel et al. 2021 demonstrated partial visual recovery in a blind patient).

Somatosensory feedback for upper-limb prostheses is achieved by microstimulation of median and ulnar nerve fascicles or direct stimulation of S1 in tetraplegic patients (Flesher et al. 2016). Restoring touch to a neuroprosthesis enables dexterous manipulation impossible under open-loop control.

Each of these technologies relies on the comparative-sensory-biology insights of earlier modules: the spectral and temporal codes at the receptor level (modules 1–7) must be re-created in appropriate detail for the central circuits to reconstruct meaningful percepts. Success requires understanding not only what the receptor detects but how the brain expects it to be encoded.

8a. Multisensory Illusions

Perceptual illusions reveal multisensory principles with particular clarity.

McGurk effect (McGurk & MacDonald 1976): a visual “ga” combined with an auditory “ba” is perceived as “da”—the brain fuses conflicting modalities into a new percept that matches neither.
Ventriloquist effect: vision dominates sound localisation when the visual estimate is more reliable; Alais & Burr (2004) showed the integration is statistically optimal.
Rubber-hand illusion (Botvinick & Cohen 1998): synchronous touching of a visible rubber hand and the hidden real hand leads to the sensation that the rubber hand is one’s own.
Double-flash illusion (Shams et al. 2000): a single visual flash paired with two auditory beeps is perceived as two flashes—audition biases vision when vision is ambiguous.
Thermal grill illusion (Craig & Bushnell 1994): alternating warm and cool bars applied to the skin are perceived as burning hot, exploiting TRPV1/TRPM8 interactions in spinal dorsal horn.

Each of these illusions arises when the multisensory posterior places weight on a combination of cues inconsistent with any single modality’s independent estimate. From a Bayesian perspective, illusions are not failures of perception; they are the correct inferences under mildly incorrect priors.

Simulation 1: Stein–Meredith Multisensory Integration

We reproduce the four Stein–Meredith principles—inverse effectiveness, spatial rule, temporal rule, and modality matching—in a minimal Hill-nonlinearity model of superior colliculus neurons.

Python

script.py160 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# Superior Colliculus / Optic Tectum Multisensory Integration
# Meredith and Stein (1983, 1996) discovered that SC neurons receiving
# convergent visual, auditory, and somatosensory inputs display:
#   (1) multisensory enhancement: response to combined stimulus > sum of
#       unimodal responses (superadditivity);
#   (2) inverse effectiveness: the weaker each unimodal response, the
#       greater the proportional enhancement;
#   (3) spatial rule: enhancement requires inputs from same spatial region;
#   (4) temporal rule: inputs within a narrow temporal window integrate.
# We reproduce all four principles in a simple integrate-and-fire model.

np.random.seed(42)

# Unimodal response as a nonlinear function of stimulus intensity
def unimodal_response(I, I0=0.5, n=2):
    # Hill-type nonlinearity
    return I**n / (I**n + I0**n)

# Multisensory integration with superadditive gain g(x)
def multisensory_response(V, A, g_max=2.2, I_half=0.25):
    r_V = unimodal_response(V)
    r_A = unimodal_response(A)
    # Superadditive gain is inversely proportional to the larger unimodal response
    u_max = np.maximum(r_V, r_A)
    gain = 1 + (g_max - 1) * I_half / (I_half + u_max)
    return (r_V + r_A) * gain, r_V, r_A

# ---- 1. Inverse effectiveness sweep ----
I_stim = np.linspace(0.01, 1.5, 60)
ME_list = []
for I in I_stim:
    r_comb, r_V, r_A = multisensory_response(I, I)
    sum_uni = r_V + r_A
    ME = (r_comb - sum_uni) / max(sum_uni, 1e-9) * 100
    ME_list.append(ME)
ME_list = np.array(ME_list)

# ---- 2. Spatial rule (maps) ----
# 2D receptive-field map: SC has a topographic map of space
Nx, Ny = 80, 80
x = np.linspace(-1, 1, Nx)
y = np.linspace(-1, 1, Ny)
X, Y = np.meshgrid(x, y)

# Visual stimulus at (0.2, 0.1)
v_src = (0.2, 0.1)
RF_V = np.exp(-((X - v_src[0])**2 + (Y - v_src[1])**2) / (2 * 0.2**2))

# Auditory stimulus at (0.25, 0.0) - close spatially
a_src_close = (0.25, 0.0)
RF_A_close = np.exp(-((X - a_src_close[0])**2 + (Y - a_src_close[1])**2) / (2 * 0.3**2))

# Auditory stimulus at (-0.6, -0.5) - far spatially
a_src_far = (-0.6, -0.5)
RF_A_far = np.exp(-((X - a_src_far[0])**2 + (Y - a_src_far[1])**2) / (2 * 0.3**2))

combined_close = RF_V + RF_A_close + 2.0 * RF_V * RF_A_close  # superadditive term
combined_far = RF_V + RF_A_far + 2.0 * RF_V * RF_A_far

# ---- 3. Temporal rule ----
dt = 1.0  # ms
t = np.arange(-300, 700, dt)
def gauss_pulse(mu, sig=60):
    return np.exp(-(t - mu)**2 / (2 * sig**2))
V_pulse = gauss_pulse(0)
# Auditory pulse offsets
offsets = np.arange(-250, 260, 25)
ME_time = []
for off in offsets:
    A_pulse = gauss_pulse(off)
    r_comb = V_pulse + A_pulse + 1.8 * V_pulse * A_pulse
    ME_time.append(r_comb.max() / (V_pulse + A_pulse).max() - 1)
ME_time = np.array(ME_time) * 100

# ---- 4. Stein-Meredith enhancement index as function of intensity ratio ----
# Fix V_intensity, sweep A_intensity
V_fixed = 0.5
A_sweep = np.linspace(0.05, 1.5, 50)
EI = []
for A in A_sweep:
    r_c, r_V, r_A = multisensory_response(V_fixed, A)
    sum_u = r_V + r_A
    EI.append(100 * (r_c - sum_u) / max(sum_u, 1e-9))
EI = np.array(EI)

fig, axes = plt.subplots(2, 2, figsize=(13, 10))
fig.patch.set_facecolor('#0a0a1a')
for ax in axes.ravel():
    ax.set_facecolor('#0a0a1a')
    ax.tick_params(colors='#94a3b8')
    for spine in ax.spines.values():
        spine.set_color('#334155')

ax1, ax2, ax3, ax4 = axes.ravel()

# Panel 1: Inverse effectiveness
ax1.plot(I_stim, ME_list, color='#5eead4', linewidth=2.3)
ax1.axhline(y=0, color='#94a3b8', linestyle=':', alpha=0.5)
ax1.fill_between(I_stim, ME_list, 0, where=(ME_list > 0),
                 color='#5eead4', alpha=0.15)
ax1.set_xlabel('Unimodal stimulus intensity (equal V=A)', color='#94a3b8')
ax1.set_ylabel('Multisensory enhancement (%)', color='#94a3b8')
ax1.set_title('Principle 1: Inverse effectiveness',
              color='#5eead4', fontsize=12, fontweight='bold')
ax1.grid(True, color='#334155', alpha=0.3)

# Panel 2a: spatial map, close configuration
im1 = ax2.imshow(combined_close, extent=[-1, 1, -1, 1], cmap='inferno',
                 origin='lower', aspect='equal')
ax2.scatter([v_src[0]], [v_src[1]], color='#5eead4', marker='o', s=80,
            edgecolor='white', linewidth=1, label='Visual')
ax2.scatter([a_src_close[0]], [a_src_close[1]], color='#f472b6', marker='^', s=80,
            edgecolor='white', linewidth=1, label='Auditory (aligned)')
ax2.set_title('Principle 2: Spatial rule (aligned -> enhancement)',
              color='#5eead4', fontsize=11, fontweight='bold')
ax2.set_xlabel('Azimuth (arb.)', color='#94a3b8')
ax2.set_ylabel('Elevation (arb.)', color='#94a3b8')
ax2.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=9)

# Panel 3: temporal rule
ax3.plot(offsets, ME_time, color='#a78bfa', linewidth=2.3)
ax3.axhline(y=0, color='#94a3b8', linestyle=':', alpha=0.5)
ax3.axvspan(-150, 150, alpha=0.1, color='#a78bfa', label='temporal window')
ax3.set_xlabel('Auditory onset relative to visual (ms)', color='#94a3b8')
ax3.set_ylabel('Peak enhancement (%)', color='#94a3b8')
ax3.set_title('Principle 3: Temporal window',
              color='#5eead4', fontsize=12, fontweight='bold')
ax3.grid(True, color='#334155', alpha=0.3)
ax3.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='#cbd5e1')

# Panel 4: Enhancement index vs intensity
ax4.plot(A_sweep, EI, color='#facc15', linewidth=2.3, label='Enhancement index')
ax4.axhline(y=0, color='#94a3b8', linestyle=':', alpha=0.5)
ax4.axvline(x=V_fixed, color='#f43f5e', linestyle='--', alpha=0.6,
            label=f'V fixed at {V_fixed}')
ax4.set_xlabel('Auditory stimulus intensity', color='#94a3b8')
ax4.set_ylabel('Enhancement index (%)', color='#94a3b8')
ax4.set_title('Principle 4: Cross-modal matching',
              color='#5eead4', fontsize=12, fontweight='bold')
ax4.grid(True, color='#334155', alpha=0.3)
ax4.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='#cbd5e1')

plt.suptitle('Stein-Meredith multisensory integration principles in SC / tectum',
             color='#e2e8f0', fontsize=14, fontweight='bold')
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.savefig('output.png', dpi=120, bbox_inches='tight', facecolor='#0a0a1a')

area_ME = np.trapezoid(ME_list, I_stim)
print('Multisensory integration simulation complete.')
print(f'Max inverse-effectiveness enhancement: {ME_list.max():.1f}%')
print(f'Integrated enhancement curve (np.trapezoid): {area_ME:.2f}')
print(f'Temporal window halfwidth (ms): ~150')
print(f'Near-aligned multisensory peak: {combined_close.max():.3f}')
print(f'Far-spatial multisensory peak: {combined_far.max():.3f}')

Click Run to execute the Python code

Code will be executed with Python 3 on the server

10a. Physics of Temporal Integration Windows

The temporal window in multisensory integration (∼300 ms in mammals) is not a biological accident but reflects the physics of signal propagation in the environment. Sound travels at 343 m/s in air; light essentially instantaneously. A bird striking water at 10 m produces a visual splash ∼0 ms after the event and an acoustic signal ∼29 ms later. The 300 ms window comfortably covers both signals while excluding unrelated events separated by more than ∼100 m of path-length.

Fish lateral line integrates with vision over a much shorter window (∼50 ms) because water-borne acoustic and hydrodynamic signals propagate faster (1480 m/s) and the spatial scales of relevance are smaller. Bats match visual and echolocation echoes with sub-millisecond precision, reflecting their short-range, high-speed foraging.

\[ \Delta t_{\text{window}} \sim \frac{L_{\max}}{\min(c_{\text{light}}, c_{\text{sound}})} + \tau_{\text{neural}}\]

Temporal window = cross-modal propagation delay + neural integration time

Simulation 2: Predictive Coding Network with Cross-Modal Priors

A two-layer Rao–Ballard hierarchy receives a noisy visual input; a cross-modal prior from thermo- or chemoreception biases the latent code. The simulation shows prediction-error descent, attentional precision weighting, and the quantitative benefit of cross-modal priors in accelerating convergence.

Python

script.py155 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# Rao-Ballard Predictive Coding Network
# A 2-layer hierarchy: lower layer L1 produces the sensory reconstruction,
# higher layer L2 sends top-down predictions to L1. Prediction-error units
# carry the residual mismatch. Cross-modal priors reshape the percept.
#
# References: Rao & Ballard (1999); Friston (2010); Keller & Mrsic-Flogel (2018).

np.random.seed(0)

# ---- Layer dimensions ----
N1 = 64    # L1 sensory units
N2 = 16    # L2 hidden units
T = 200    # timesteps

# Generative weights U: L2 -> L1 (what we predict about L1 from L2 code)
U = np.random.randn(N1, N2) * 0.3
# Fix for readability: make columns orthogonal-ish
U, _ = np.linalg.qr(U)
U = U[:, :N2]

# Visual input: a low-dim signal embedded in high-dim sensory array
true_r2 = np.zeros(N2)
true_r2[2] = 1.5
true_r2[7] = -0.9
true_r2[11] = 0.6
visual_input = U @ true_r2 + 0.35 * np.random.randn(N1)

# Cross-modal prior from module 7 (thermoreception): we simulate an auditory/thermal
# prior that boosts prior belief on r2[2] and r2[11] (e.g., same object).
cross_modal_prior = np.zeros(N2)
cross_modal_prior[2] = 0.4
cross_modal_prior[11] = 0.3

# ---- Inference as iterative prediction-error minimisation ----
# r1 = U r2 + e1;   dr2/dt = -lambda2 (r2 - prior) + U.T e1
lr_r2 = 0.03
lambda2 = 0.05

r2 = np.zeros(N2)
r2_trace = np.zeros((T, N2))
err_trace = np.zeros(T)
reconstruction_trace = np.zeros((T, N1))

for t in range(T):
    pred = U @ r2
    err = visual_input - pred
    r2 = r2 + lr_r2 * (U.T @ err - lambda2 * (r2 - cross_modal_prior))
    r2_trace[t] = r2
    err_trace[t] = np.sqrt(np.mean(err**2))
    reconstruction_trace[t] = pred

# Control: no cross-modal prior
r2_np = np.zeros(N2)
r2_trace_np = np.zeros((T, N2))
err_trace_np = np.zeros(T)
for t in range(T):
    pred = U @ r2_np
    err = visual_input - pred
    r2_np = r2_np + lr_r2 * (U.T @ err - lambda2 * r2_np)
    r2_trace_np[t] = r2_np
    err_trace_np[t] = np.sqrt(np.mean(err**2))

# ---- Demonstration of attention as precision weighting ----
# Boost precision on predicted dimensions by scaling gradient
attention_gain = np.ones(N2)
attention_gain[2] = 3.0      # attention to feature 2
attention_gain[11] = 3.0
r2_att = np.zeros(N2)
r2_trace_att = np.zeros((T, N2))
for t in range(T):
    pred = U @ r2_att
    err = visual_input - pred
    r2_att = r2_att + lr_r2 * attention_gain * (U.T @ err - lambda2 * r2_att)
    r2_trace_att[t] = r2_att

# ---- Plotting ----
fig, axes = plt.subplots(2, 2, figsize=(13, 10))
fig.patch.set_facecolor('#0a0a1a')
for ax in axes.ravel():
    ax.set_facecolor('#0a0a1a')
    ax.tick_params(colors='#94a3b8')
    for spine in ax.spines.values():
        spine.set_color('#334155')

ax1, ax2, ax3, ax4 = axes.ravel()

# Panel 1: prediction-error convergence
ax1.plot(err_trace, color='#5eead4', linewidth=2.2, label='With cross-modal prior')
ax1.plot(err_trace_np, color='#a78bfa', linewidth=2.2, label='No prior')
ax1.set_xlabel('Iteration', color='#94a3b8')
ax1.set_ylabel('RMS prediction error', color='#94a3b8')
ax1.set_title('Panel A: Error minimisation',
              color='#5eead4', fontsize=12, fontweight='bold')
ax1.grid(True, color='#334155', alpha=0.3)
ax1.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='#cbd5e1')

# Panel 2: latent r2 dynamics
for i in [2, 7, 11]:
    ax2.plot(r2_trace[:, i], linewidth=2, label=f'r2[{i}] (prior boosted)' if i != 7 else f'r2[{i}] (no prior)')
ax2.axhline(y=true_r2[2], color='#5eead4', linestyle=':', alpha=0.5)
ax2.axhline(y=true_r2[7], color='#a78bfa', linestyle=':', alpha=0.5)
ax2.axhline(y=true_r2[11], color='#f472b6', linestyle=':', alpha=0.5)
ax2.set_xlabel('Iteration', color='#94a3b8')
ax2.set_ylabel('Latent code amplitude', color='#94a3b8')
ax2.set_title('Panel B: Hidden code r2 converges',
              color='#5eead4', fontsize=12, fontweight='bold')
ax2.grid(True, color='#334155', alpha=0.3)
ax2.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=9)

# Panel 3: reconstruction over time
timepoints = [0, 10, 50, 199]
colors_r = ['#a78bfa', '#f472b6', '#facc15', '#5eead4']
ax3.plot(visual_input, color='#f43f5e', linewidth=1.5, alpha=0.8, label='Input')
for tt, c in zip(timepoints, colors_r):
    ax3.plot(reconstruction_trace[tt], color=c, linewidth=1.5, alpha=0.9,
             label=f'Reconstruction t={tt}')
ax3.set_xlabel('L1 unit index', color='#94a3b8')
ax3.set_ylabel('Activation', color='#94a3b8')
ax3.set_title('Panel C: Reconstruction dynamics',
              color='#5eead4', fontsize=12, fontweight='bold')
ax3.grid(True, color='#334155', alpha=0.3)
ax3.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=9)

# Panel 4: attention
ax4.plot(np.abs(r2_trace[-1]), 'o-', color='#5eead4', linewidth=2, label='Default')
ax4.plot(np.abs(r2_trace_att[-1]), 's-', color='#f472b6', linewidth=2, label='With attention on 2,11')
ax4.plot(np.abs(true_r2), 'x', color='#facc15', markersize=10, label='Ground truth')
ax4.set_xlabel('Latent unit index', color='#94a3b8')
ax4.set_ylabel('|r2| at convergence', color='#94a3b8')
ax4.set_title('Panel D: Attentional precision weighting',
              color='#5eead4', fontsize=12, fontweight='bold')
ax4.grid(True, color='#334155', alpha=0.3)
ax4.legend(facecolor='#1e293b', edgecolor='#334155', labelcolor='#cbd5e1', fontsize=9)

plt.suptitle('Predictive coding with cross-modal priors (Rao-Ballard / Friston)',
             color='#e2e8f0', fontsize=14, fontweight='bold')
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.savefig('output.png', dpi=120, bbox_inches='tight', facecolor='#0a0a1a')

int_err = np.trapezoid(err_trace, np.arange(T))
int_err_np = np.trapezoid(err_trace_np, np.arange(T))
print('Predictive coding simulation complete.')
print(f'Integrated error (np.trapezoid), with prior:    {int_err:.2f}')
print(f'Integrated error (np.trapezoid), without prior: {int_err_np:.2f}')
print(f'Relative benefit of cross-modal prior: {100*(int_err_np-int_err)/int_err_np:.1f}%')
print(f'Final latent code (truth -> recovery):')
for i in range(N2):
    if abs(true_r2[i]) > 0.01 or abs(r2_trace[-1, i]) > 0.1:
        print(f'  idx {i}: truth={true_r2[i]:+.2f}  recovered={r2_trace[-1, i]:+.2f}')

Click Run to execute the Python code

Code will be executed with Python 3 on the server

11. Futures: Novel Senses, Enhanced Senses

The logic of sensory substitution raises the tantalising possibility of novel senses: augmenting humans with modalities no primate has ever possessed. Nagel and Bach-y-Rita hinted at this decades ago, and contemporary prototypes include:

North Sense (Cyborg Nest): a chest-implanted magnetometer that vibrates when the wearer faces north, granting a continuous magnetoceptive percept (cf. module 5).
feelSpace belt: a vibrotactile compass worn around the waist; long-term wearers report a spatial percept of cardinal direction.
Infrared vision demonstrated in rats by Thomson et al. (2013) via neuroprosthetic coupling of IR photodiode output to S1 microstimulation—the animals learned to use IR within days.
Expanded colour vision in tetrachromat humans; Jameson et al. (2001) identified rare carriers of four functional cone pigments.

Each of these interventions forces us to confront deep questions about the neural plasticity that supports new sensory codes, the ethical implications of altering the human sensorium, and the boundary—if any—between natural evolution and cybernetic augmentation.

12. Synthesis of the Course

Across the nine modules of this course, we traced the flow of information from photons, phonons, pressure waves, electrical fields, magnetic lines, molecules, and heat at the receptor, through peripheral encoding, central map alignment, cortical expansion recoding, and Bayesian integration into unified percepts. Several organising principles emerge repeatedly.

Matched filters: every receptor is shaped by the signal statistics of its ecological niche. Mantis-shrimp cones, bee UV-sensitive rhodopsin, bat cochlear sharpening at echolocation frequencies, pit-organ TRPA1 tuned to 37°C mammalian targets—each is a custom-built matched filter.

Combinatorial coding: from OR-glomerulus patterns (module 6) to V1 orientation columns (module 1) to cortical population codes for location (module 5), evolution favours combinatorial representations because they scale exponentially with resources.

Bayesian integration: from Ernst–Banks cue combination to Rao–Ballard predictive coding, the brain’s central computation is probabilistic inference over latent causes given sensory evidence.

Convergent evolution: IR detection in snakes, vampire bats, and fire-chasing beetles; electroreception in fish, platypuses, and bees; magnetoreception in birds, turtles, and mole-rats—nature repeatedly finds multiple solutions to the same sensing problem.

This course thus equips you with both the molecular and computational vocabulary to understand any sensory system, biological or engineered, and to design novel interfaces between them. The final frontier, as always, is the brain itself: the substrate that turns all these heterogeneous receptor signals into the integrated, subjective, meaningful experience we call perception.

13. Developmental Emergence of Multisensory Integration

Multisensory integration is not innate in the mature form. Wallace & Stein (1997) showed that cat SC neurons gain the Stein–Meredith properties only after several weeks of postnatal visual, auditory, and tactile experience. Raising kittens in darkness eliminates the spatial rule. Early unisensory experience is required to calibrate the multimodal receptive-field topography.

In humans, infants show crude audiovisual integration by 4 months but lack the Ernst–Banks optimality until about age 8–10 years (Gori et al. 2008). The slow development reflects the need to learn the relative reliabilities of each modality—reliabilities that themselves change as the sensory organs mature.

Critical-period plasticity in multisensory circuits is gated by parvalbumin-positive interneurons and the maturation of perineuronal nets (Hensch 2005). Reopening these critical periods pharmacologically is a promising (though risky) avenue for adult sensory rehabilitation after stroke or peripheral damage.

14. Disorders of Multisensory Integration

Abnormal multisensory integration is a signature of several clinical conditions. Autism spectrum disorder is associated with widened temporal binding windows (Stevenson et al. 2014), reduced reliance on cross-modal priors, and atypical sensory precision weighting. Van de Cruys et al. (2014) formalised this within predictive coding as high and inflexible precision on prediction errors: small sensory mismatches attract disproportionate attention and are experienced as overwhelming.

Schizophrenia shows the opposite signature in some dimensions: weakened sensory priors (reduced rubber-hand illusion, altered ventriloquist effect) consistent with aberrant precision models of positive symptoms (hallucinations as excessively-precise sensory predictions). Sterzer et al. (2018) reviewed predictive coding accounts of psychosis.

Synaesthesia—cross-modal experience (e.g., coloured letters, shaped tastes)— may reflect reduced cross-modal pruning during development. Genetic studies suggest variants in the 2q24 locus predispose to grapheme-colour synaesthesia (Asher et al. 2009). Synaesthetes show enhanced multisensory integration and memory performance on cross-modal tasks.

Each of these conditions—autism, schizophrenia, synaesthesia—supports the central claim that multisensory integration is a computationally precise process that can go wrong along specific, quantifiable dimensions. Far from being a soft psychological concept, it is a concrete neural computation with identifiable molecular and circuit substrates.

Key References

• Stein, B. E. & Meredith, M. A. (1993). The Merging of the Senses. MIT Press.

• Meredith, M. A. & Stein, B. E. (1983). “Interactions among converging sensory inputs in the superior colliculus.” Science, 221, 389–391.

• Ernst, M. O. & Banks, M. S. (2002). “Humans integrate visual and haptic information in a statistically optimal fashion.” Nature, 415, 429–433.

• Alais, D. & Burr, D. (2004). “The ventriloquist effect results from near-optimal bimodal integration.” Current Biology, 14, 257–262.

• Rao, R. P. N. & Ballard, D. H. (1999). “Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.” Nature Neuroscience, 2, 79–87.

• Friston, K. (2010). “The free-energy principle: a unified brain theory?” Nature Reviews Neuroscience, 11, 127–138.

• Feldman, H. & Friston, K. J. (2010). “Attention, uncertainty, and free-energy.” Frontiers in Human Neuroscience, 4, 215.

• Sherman, S. M. & Guillery, R. W. (2006). Exploring the Thalamus and Its Role in Cortical Function. MIT Press.

• Saalmann, Y. B., Pinsk, M. A., Wang, L., Li, X. & Kastner, S. (2012). “The pulvinar regulates information transmission between cortical areas based on attention demands.” Science, 337, 753–756.

• Sadato, N. et al. (1996). “Activation of the primary visual cortex by Braille reading in blind subjects.” Nature, 380, 526–528.

• Bavelier, D. & Neville, H. J. (2002). “Cross-modal plasticity: where and how?” Nature Reviews Neuroscience, 3, 443–452.

• Medina, L. & Reiner, A. (2000). “Do birds possess homologues of mammalian primary visual, somatosensory and motor cortices?” Trends in Neurosciences, 23, 1–12.

• Caron, S. J. C., Ruta, V., Abbott, L. F. & Axel, R. (2013). “Random convergence of olfactory inputs in the Drosophila mushroom body.” Nature, 497, 113–117.

• Aso, Y. et al. (2014). “The neuronal architecture of the mushroom body provides a logic for associative learning.” eLife, 3, e04577.

• Bouzaiane, E., Trannoy, S., Scheunemann, L., Placais, P.-Y. & Preat, T. (2015). “Two independent mushroom body output circuits retrieve the six discrete components of Drosophila aversive memory.” Cell Reports, 11, 1280–1292.

• Czech-Damal, N. U. et al. (2012). “Electroreception in the Guiana dolphin (Sotalia guianensis).” Proc. R. Soc. B, 279, 663–668.

• Leitch, D. B. & Catania, K. C. (2012). “Structure, innervation and response properties of integumentary sensory organs in crocodilians.” Journal of Experimental Biology, 215, 4217–4230.

• Hartline, P. H., Kass, L. & Loop, M. S. (1978). “Merging of modalities in the optic tectum: infrared and visual integration in rattlesnakes.” Science, 199, 1225–1229.

• Craig, A. D. (2003). “Interoception: the sense of the physiological condition of the body.” Current Opinion in Neurobiology, 13, 500–505.

• Cryan, J. F. & Dinan, T. G. (2012). “Mind-altering microorganisms: the impact of the gut microbiota on brain and behaviour.” Nature Reviews Neuroscience, 13, 701–712.

• Seth, A. K. (2013). “Interoceptive inference, emotion, and the embodied self.” Trends in Cognitive Sciences, 17, 565–573.

• Crick, F. & Koch, C. (1990). “Towards a neurobiological theory of consciousness.” Seminars in the Neurosciences, 2, 263–275.

• Tononi, G. (2004). “An information integration theory of consciousness.” BMC Neuroscience, 5, 42.

• Dehaene, S. & Changeux, J.-P. (2011). “Experimental and theoretical approaches to conscious processing.” Neuron, 70, 200–227.

• Flesher, S. N. et al. (2016). “Intracortical microstimulation of human somatosensory cortex.” Science Translational Medicine, 8, 361ra141.

• Sahel, J.-A. et al. (2021). “Partial recovery of visual function in a blind patient after optogenetic therapy.” Nature Medicine, 27, 1223–1229.

• Thomson, E. E., Carra, R. & Nicolelis, M. A. L. (2013). “Perceiving invisible light through a somatosensory cortical prosthesis.” Nature Communications, 4, 1482.

• Vaswani, A. et al. (2017). “Attention is all you need.” NeurIPS.

• Radford, A. et al. (2021). “Learning transferable visual models from natural language supervision (CLIP).” ICML.

• Jaegle, A. et al. (2021). “Perceiver IO: a general architecture for structured inputs and outputs.” ICLR.

• Keller, G. B. & Mrsic-Flogel, T. D. (2018). “Predictive processing: a canonical cortical computation.” Neuron, 100, 424–435.

• Wallace, M. T. & Stein, B. E. (1997). “Development of multisensory neurons and multisensory integration in cat superior colliculus.” Journal of Neuroscience, 17, 2429–2444.

• Gori, M., Del Viva, M., Sandini, G. & Burr, D. C. (2008). “Young children do not integrate visual and haptic form information.” Current Biology, 18, 694–698.

• Hensch, T. K. (2005). “Critical period plasticity in local cortical circuits.” Nature Reviews Neuroscience, 6, 877–888.

• Stevenson, R. A. et al. (2014). “Multisensory temporal integration in autism spectrum disorders.” Journal of Neuroscience, 34, 691–697.

• Van de Cruys, S. et al. (2014). “Precise minds in uncertain worlds: predictive coding in autism.” Psychological Review, 121, 649–675.

• Sterzer, P. et al. (2018). “The predictive coding account of psychosis.” Biological Psychiatry, 84, 634–643.

• McGurk, H. & MacDonald, J. (1976). “Hearing lips and seeing voices.” Nature, 264, 746–748.

• Botvinick, M. & Cohen, J. (1998). “Rubber hands ‘feel’ touch that eyes see.” Nature, 391, 756.

• Shams, L., Kamitani, Y. & Shimojo, S. (2000). “Illusions: what you see is what you hear.” Nature, 408, 788.

• Tononi, G. (2008). “Consciousness as integrated information: a provisional manifesto.” Biological Bulletin, 215, 216–242.

• Kingma, D. P. & Welling, M. (2013). “Auto-encoding variational Bayes.” ICLR.

• Asher, J. E. et al. (2009). “A whole-genome scan and fine-mapping linkage study of auditory-visual synaesthesia.” American Journal of Human Genetics, 84, 279–285.

Share:X Reddit LinkedIn

← Module 7: Thermo & Hygroreception Back to Course Overview →