Part II: Systems Neuroscience | Chapter 4

Attention & Consciousness

Attention models, the binding problem, neural correlates of consciousness, Global Workspace Theory, and Integrated Information Theory

The Neural Basis of Awareness

Attention selects relevant information from the torrent of sensory input, while consciousness gives rise to subjective experience. These interrelated phenomena represent some of the deepest questions in neuroscience. Attention can operate without consciousness (subliminal priming), and consciousness can occur without focused attention (peripheral awareness), yet they share neural mechanisms and computational principles.

This chapter covers computational models of attention (gain modulation, normalization), the binding problem and its proposed solutions (synchrony, feature integration theory), neural correlates of consciousness (NCC), and the two leading theoretical frameworks: Global Workspace Theory and Integrated Information Theory.

1. Computational Models of Attention

Attention enhances the neural representation of selected stimuli while suppressing distractors. Neurophysiological recordings reveal several distinct mechanisms: gain modulation (multiplicative scaling of tuning curves), sharpening of tuning, noise reduction, and changes in noise correlations. The normalization model of attention provides a unified account of these diverse effects.

Derivation 1: The Normalization Model of Attention

Reynolds and Heeger (2009) proposed that attention acts through an "attention field" that modulates neural responses before divisive normalization. The response of neuron$i$ is:

$$r_i = \frac{(\mathbf{a} \cdot \mathbf{s})_i \cdot E_i}{\sigma^2 + \sum_j (\mathbf{a} \cdot \mathbf{s})_j \cdot w_{ij}}$$

where $\mathbf{s}$ is the stimulus drive, $\mathbf{a}$ is the attention field (a spatial or feature-based gain), $E_i$ is the excitatory drive for neuron $i$,$w_{ij}$ are normalization weights, and $\sigma^2$ is a semi-saturation constant. The key insight is that the same mechanism produces different phenomenology depending on the relative sizes of the attention field and stimulus:

Narrow attention field (smaller than stimulus): contrast gain, shifting the contrast-response function leftward
Broad attention field (larger than stimulus): response gain, multiplicatively scaling the entire response
Matched sizes: intermediate effects combining both gain changes

This explains why different experiments report different attentional effects — the mechanism is the same, but the relative geometry varies.

2. The Binding Problem

How does the brain combine features processed in different cortical areas into unified percepts? A red moving ball activates color neurons in V4, motion neurons in V5/MT, and shape neurons in inferotemporal cortex. The binding problem asks how these distributed representations are linked to represent a single object.

Derivation 2: Temporal Binding via Neural Synchrony

The temporal correlation hypothesis (von der Malsburg, 1981; Singer and Gray, 1995) proposes that neurons representing features of the same object synchronize their firing in the gamma band (30–80 Hz). The coherence between two spike trains $x(t)$ and$y(t)$ at frequency $f$ is:

$$C_{xy}(f) = \frac{|S_{xy}(f)|^2}{S_{xx}(f) \cdot S_{yy}(f)}$$

where $S_{xy}(f)$ is the cross-spectral density. To quantify binding, consider$N$ neurons organized into two groups (objects). Within-group synchrony is:

$$\Phi_{\text{within}} = \frac{1}{|G|^2}\sum_{i,j \in G} C_{ij}(f_\gamma)$$

For binding to work, within-object coherence must significantly exceed between-object coherence. The required precision of synchronization is approximately$\Delta t < 1/(2f_\gamma)$, or about 7–15 ms for gamma oscillations. This temporal window matches the STDP learning rule window, suggesting that synchrony-based binding may also drive the formation of object representations through Hebbian learning.

Derivation 3: Feature Integration Theory and the Saliency Map

Treisman's Feature Integration Theory (1980) proposes that features are first processed in parallel "feature maps," then bound through focal attention. The computational implementation uses a saliency map. For feature dimension $k$(color, orientation, etc.) and location $(x, y)$:

$$S_k(x, y) = \sum_{c=1}^{C} |F_k^c(x, y) - \bar{F}_k^c|$$

where $F_k^c$ is the feature map at scale $c$ and $\bar{F}_k^c$ is its mean. The combined saliency map integrates across feature dimensions:

$$S(x, y) = \sum_{k=1}^{K} w_k \cdot \hat{S}_k(x, y)$$

where $\hat{S}_k$ is the normalized conspicuity map and $w_k$ are learned or task-dependent weights. A winner-take-all network selects the most salient location for the next fixation. Top-down attention modulates the weights $w_k$, implementing feature-based attention (e.g., biasing toward color when searching for a red target).

3. Neural Correlates of Consciousness

The neural correlate of consciousness (NCC) is defined as the minimal set of neuronal mechanisms jointly sufficient for a specific conscious percept (Koch et al., 2016). Key experimental paradigms include binocular rivalry, masking, and the no-report paradigm. Current evidence implicates a posterior cortical "hot zone" (parieto-occipital-temporal cortex) rather than prefrontal cortex as the content-specific NCC.

Derivation 4: Global Workspace Theory (GWT)

Baars (1988) and Dehaene et al. (1998) proposed that consciousness arises when information is "broadcast" to a global workspace formed by long-range cortical connections. Mathematically, the workspace can be modeled as a dynamical system with an ignition threshold. The activity of workspace neuron $i$ is:

$$\tau \frac{dx_i}{dt} = -x_i + f\left(\sum_j W_{ij}^{\text{local}} x_j + \sum_k W_{ik}^{\text{global}} x_k + I_i^{\text{ext}}\right)$$

The system has two stable states: (1) a low-activity "subliminal" state where information remains in local modules, and (2) a high-activity "ignited" state where global broadcasting occurs. The transition happens when the input exceeds a threshold set by the global coupling strength:

$$I_{\text{crit}} = \frac{1 - W^{\text{local}} f'(0)}{W^{\text{global}} f'(0)}$$

This predicts the all-or-none nature of conscious access: stimuli either reach awareness (ignition) or remain subliminal, with no intermediate states. The model explains the P3b ERP component, the "late ignition" observed ~300 ms post-stimulus in conscious perception, and the role of prefrontal-parietal networks in broadcasting.

Derivation 5: Integrated Information Theory (IIT)

Tononi's IIT (2004, 2008) proposes that consciousness is identical to integrated information, $\Phi$. For a system of $N$ binary elements in state$\mathbf{x}$ with transition probability matrix $\mathbf{T}$:

$$\Phi = \min_{\text{partition } P} \left[ D_{\text{KL}}\left(p(\mathbf{x}_{t+1} | \mathbf{x}_t) \,\|\, \prod_k p(\mathbf{x}_{t+1}^k | \mathbf{x}_t^k)\right)\right]$$

where the minimum is taken over all possible bipartitions $P$ of the system, and $D_{\text{KL}}$ is the Kullback-Leibler divergence. Intuitively,$\Phi$ measures how much the whole system's behavior exceeds what its parts can account for independently. For a simple 2-element system with coupling matrix:

$$\Phi_{\text{2-element}} = I(\mathbf{x}_{t+1}; \mathbf{x}_t) - \max_P \sum_k I(\mathbf{x}_{t+1}^k; \mathbf{x}_t^k)$$

IIT makes specific predictions: the cerebellum, despite having more neurons than the cerebrum, contributes little to consciousness because its feedforward architecture generates low $\Phi$. The thalamocortical system, with dense recurrent connectivity, generates high $\Phi$. Computing $\Phi$ exactly is NP-hard, but approximations based on spectral analysis of the connectivity matrix are tractable.

4. Historical Development

• 1958: Broadbent proposes the filter model of attention, arguing that selection occurs early in processing.
• 1980: Treisman introduces Feature Integration Theory, explaining illusory conjunctions and the role of attention in binding.
• 1981: von der Malsburg proposes the temporal correlation hypothesis for binding via neural synchrony.
• 1988: Baars formulates Global Workspace Theory as a cognitive architecture for consciousness.
• 1990: Crick and Koch propose searching for neural correlates of consciousness (NCC), launching the modern science of consciousness.
• 1998: Dehaene, Kerszberg, and Changeux develop the neuronal Global Workspace model, predicting all-or-none ignition.
• 2004: Tononi introduces Integrated Information Theory (IIT), proposing $\Phi$ as a measure of consciousness.
• 2009: Reynolds and Heeger unify attentional effects under the normalization model of attention.

5. Applications

Disorders of Consciousness

IIT-derived measures (perturbational complexity index, PCI) distinguish between vegetative state, minimally conscious state, and locked-in syndrome with higher accuracy than behavioral assessment alone. TMS-EEG protocols measure cortical complexity at the bedside.

Anesthesia Monitoring

Consciousness theories guide the development of anesthesia depth monitors. Measures of cortical integration (effective connectivity, spectral complexity) track the loss and recovery of consciousness during general anesthesia.

Attention-Deficit Disorders

The normalization model predicts that ADHD involves altered gain control, explaining both distractibility and hyperfocus. Computational phenotyping decomposes attentional deficits into gain, normalization, and threshold components.

AI and Machine Consciousness

Attention mechanisms in transformers (scaled dot-product attention) are inspired by neuroscience. IIT provides criteria for evaluating whether artificial systems could be conscious, with implications for AI ethics and design.

6. Computational Exploration

Attention and Consciousness: Normalization, Binding, GWT, and IIT

Python

script.py328 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

print("=" * 72)
print("ATTENTION & CONSCIOUSNESS: MODELS AND MEASURES")
print("=" * 72)

np.random.seed(42)

# --------------------------------------------------
# 1. NORMALIZATION MODEL OF ATTENTION
# --------------------------------------------------
print()
print("1. NORMALIZATION MODEL OF ATTENTION")
print("-" * 55)

contrast = np.linspace(0, 1, 100)
sigma_norm = 0.1
R_max = 50.0
n_exp = 2.0

# Without attention
response_no_att = R_max * contrast**n_exp / (sigma_norm**n_exp + contrast**n_exp)

# Response gain (broad attention field)
att_gain = 2.0
response_gain = att_gain * R_max * contrast**n_exp / (sigma_norm**n_exp + att_gain * contrast**n_exp)

# Contrast gain (narrow attention field)
c50_shift = 0.5
response_contrast = R_max * contrast**n_exp / ((sigma_norm * c50_shift)**n_exp + contrast**n_exp)

print("  Normalization model: R = R_max * c^n / (sigma^n + c^n)")
print("  R_max = {:.0f}, sigma = {:.2f}, n = {:.1f}".format(R_max, sigma_norm, n_exp))
print()
print("  At 50% contrast:")
print("    No attention: {:.1f} Hz".format(np.interp(0.5, contrast, response_no_att)))
print("    Response gain: {:.1f} Hz".format(np.interp(0.5, contrast, response_gain)))
print("    Contrast gain: {:.1f} Hz".format(np.interp(0.5, contrast, response_contrast)))

# --------------------------------------------------
# 2. GAMMA SYNCHRONY AND BINDING
# --------------------------------------------------
print()
print("2. GAMMA SYNCHRONY AND FEATURE BINDING")
print("-" * 55)

dt = 0.001
T_sync = 1.0
t_sync = np.arange(0, T_sync, dt)
n_t = len(t_sync)
freq_gamma = 40.0  # Hz

# Group 1: synchronized neurons (same object)
phase1_a = 2 * np.pi * freq_gamma * t_sync + 0.0
phase1_b = 2 * np.pi * freq_gamma * t_sync + 0.1  # small phase offset

# Group 2: different object
phase2 = 2 * np.pi * freq_gamma * t_sync + np.pi / 2  # different phase

signal1a = np.sin(phase1_a) + 0.3 * np.random.randn(n_t)
signal1b = np.sin(phase1_b) + 0.3 * np.random.randn(n_t)
signal2 = np.sin(phase2) + 0.3 * np.random.randn(n_t)

# Compute coherence using cross-correlation
def compute_coherence(x, y, fs, freq_target, bandwidth=5):
    from numpy.fft import fft, fftfreq
    N = len(x)
    X = fft(x)
    Y = fft(y)
    freqs = fftfreq(N, 1/fs)
    mask = (np.abs(freqs) >= freq_target - bandwidth) & (np.abs(freqs) <= freq_target + bandwidth)
    Sxy = np.mean(np.abs(X[mask] * np.conj(Y[mask])))
    Sxx = np.mean(np.abs(X[mask])**2)
    Syy = np.mean(np.abs(Y[mask])**2)
    return Sxy**2 / (Sxx * Syy) if Sxx * Syy > 0 else 0

coh_within = compute_coherence(signal1a, signal1b, 1/dt, freq_gamma)
coh_between = compute_coherence(signal1a, signal2, 1/dt, freq_gamma)

print("  Gamma frequency: {} Hz".format(int(freq_gamma)))
print("  Within-object coherence: {:.4f}".format(coh_within))
print("  Between-object coherence: {:.4f}".format(coh_between))
print("  Binding ratio: {:.2f}".format(coh_within / max(coh_between, 0.001)))

# --------------------------------------------------
# 3. SALIENCY MAP
# --------------------------------------------------
print()
print("3. SALIENCY MAP (Feature Integration)")
print("-" * 55)

map_size = 50
# Create a simple scene with a pop-out target
orientation_map = np.zeros((map_size, map_size))
color_map = np.zeros((map_size, map_size))

# Background: vertical lines, green
orientation_map[:, :] = 0.2 * np.random.randn(map_size, map_size)
color_map[:, :] = 0.2 * np.random.randn(map_size, map_size)

# Target: horizontal line (different orientation), red (different color)
target_x, target_y = 35, 20
orientation_map[target_y-2:target_y+2, target_x-2:target_x+2] = 3.0
color_map[target_y-2:target_y+2, target_x-2:target_x+2] = 3.0

# Compute conspicuity maps
orient_saliency = np.abs(orientation_map - np.mean(orientation_map))
color_saliency = np.abs(color_map - np.mean(color_map))

# Combined saliency
saliency = 0.5 * orient_saliency / np.max(orient_saliency) + 0.5 * color_saliency / np.max(color_saliency)

# Find peak saliency location
peak_loc = np.unravel_index(np.argmax(saliency), saliency.shape)
print("  Map size: {}x{}".format(map_size, map_size))
print("  Target location: ({}, {})".format(target_x, target_y))
print("  Peak saliency location: ({}, {})".format(peak_loc[1], peak_loc[0]))
print("  Target detected: {}".format(
    abs(peak_loc[0] - target_y) < 3 and abs(peak_loc[1] - target_x) < 3))

# --------------------------------------------------
# 4. GLOBAL WORKSPACE IGNITION
# --------------------------------------------------
print()
print("4. GLOBAL WORKSPACE: IGNITION DYNAMICS")
print("-" * 55)

N_gw = 50
tau_gw = 10.0  # ms
w_local = 0.5
w_global = 0.3
dt_gw = 0.5
T_gw = 500.0
n_steps_gw = int(T_gw / dt_gw)

def gw_sigmoid(x):
    return 1.0 / (1.0 + np.exp(-5.0 * (x - 0.5)))

# Test different input strengths
input_strengths = [0.2, 0.4, 0.6, 0.8]
gw_trajectories = {}

for I_strength in input_strengths:
    x_gw = np.zeros((n_steps_gw, N_gw))
    x_gw[0, :] = 0.01

for t in range(n_steps_gw - 1):
        # External input to first 10 neurons
        I_ext = np.zeros(N_gw)
        if t * dt_gw > 50 and t * dt_gw < 200:
            I_ext[:10] = I_strength

for i in range(N_gw):
            local_input = w_local * x_gw[t, max(0,i-2):min(N_gw,i+3)].mean()
            global_input = w_global * x_gw[t, :].mean()
            dx = (-x_gw[t, i] + gw_sigmoid(local_input + global_input + I_ext[i])) / tau_gw
            x_gw[t+1, i] = np.clip(x_gw[t, i] + dt_gw * dx, 0, 1)

gw_trajectories[I_strength] = x_gw.mean(axis=1)

final_act = x_gw[-1, :].mean()
    ignited = "YES (conscious)" if final_act > 0.3 else "NO (subliminal)"
    print("  I = {:.1f}: final activity = {:.3f} -> {}".format(I_strength, final_act, ignited))

# --------------------------------------------------
# 5. INTEGRATED INFORMATION (PHI) FOR SMALL SYSTEMS
# --------------------------------------------------
print()
print("5. INTEGRATED INFORMATION (Phi) FOR SIMPLE SYSTEMS")
print("-" * 55)

def compute_phi_simple(W, noise=0.1):
    """Compute approximate Phi for a small binary system."""
    N = W.shape[0]
    # Simulate and compute mutual information
    n_samples = 5000
    states = np.zeros((n_samples, N))
    states[0] = np.random.randint(0, 2, N).astype(float)

for t in range(1, n_samples):
        h = W.dot(states[t-1]) + noise * np.random.randn(N)
        states[t] = (h > 0.5).astype(float)

# Total mutual information I(X_t+1; X_t)
    def mi_estimate(x, y):
        bins = 4
        hist_xy, _, _ = np.histogram2d(x, y, bins=bins)
        pxy = hist_xy / hist_xy.sum()
        px = pxy.sum(axis=1)
        py = pxy.sum(axis=0)
        mi = 0
        for i in range(bins):
            for j in range(bins):
                if pxy[i,j] > 0 and px[i] > 0 and py[j] > 0:
                    mi += pxy[i,j] * np.log2(pxy[i,j] / (px[i] * py[j]))
        return max(0, mi)

# Total system MI
    total_mi = 0
    for i in range(N):
        for j in range(N):
            total_mi += mi_estimate(states[1:, i], states[:-1, j])
    total_mi /= N

# Partitioned MI (best bipartition = split in half)
    half = N // 2
    part_mi = 0
    for i in range(half):
        for j in range(half):
            part_mi += mi_estimate(states[1:, i], states[:-1, j])
    for i in range(half, N):
        for j in range(half, N):
            part_mi += mi_estimate(states[1:, i], states[:-1, j])
    part_mi /= N

phi = max(0, total_mi - part_mi)
    return phi, total_mi

# Test different architectures
architectures = {
    'Feedforward chain': np.diag(np.ones(3), -1)[:4, :4] * 0.8,
    'Recurrent loop': np.array([[0, 0, 0, 0.8],
                                 [0.8, 0, 0, 0],
                                 [0, 0.8, 0, 0],
                                 [0, 0, 0.8, 0]]),
    'Fully connected': np.ones((4, 4)) * 0.3,
    'Disconnected': np.eye(4) * 0.5,
}

print()
for name, W in architectures.items():
    phi, mi = compute_phi_simple(W)
    print("  {}: Phi = {:.4f}, MI = {:.4f}".format(name, phi, mi))

# --------------------------------------------------
# PLOTTING
# --------------------------------------------------
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle("Attention & Consciousness: Normalization, Binding, GWT, and IIT",
             fontsize=14, fontweight='bold', color='white')
fig.patch.set_facecolor('#0a0a0a')

for ax in axes.flat:
    ax.set_facecolor('#111111')
    ax.tick_params(colors='white', labelsize=8)
    ax.xaxis.label.set_color('white')
    ax.yaxis.label.set_color('white')
    ax.title.set_color('#f0abfc')
    for spine in ax.spines.values():
        spine.set_color('#333333')

# Panel 1: Normalization model
axes[0,0].plot(contrast * 100, response_no_att, color='white', lw=2, alpha=0.7, label='No attention')
axes[0,0].plot(contrast * 100, response_gain, color='#ec4899', lw=2, label='Response gain')
axes[0,0].plot(contrast * 100, response_contrast, color='#d946ef', lw=2, label='Contrast gain')
axes[0,0].set_xlabel('Contrast (%)')
axes[0,0].set_ylabel('Response (Hz)')
axes[0,0].set_title('Normalization Model of Attention')
axes[0,0].legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')

# Panel 2: Gamma synchrony signals
t_plot = t_sync[:500]
axes[0,1].plot(t_plot * 1000, signal1a[:500], color='#ec4899', alpha=0.7, lw=1, label='Neuron 1a')
axes[0,1].plot(t_plot * 1000, signal1b[:500], color='#d946ef', alpha=0.7, lw=1, label='Neuron 1b (same obj)')
axes[0,1].plot(t_plot * 1000, signal2[:500], color='#60a5fa', alpha=0.7, lw=1, label='Neuron 2 (diff obj)')
axes[0,1].set_xlabel('Time (ms)')
axes[0,1].set_ylabel('Activity')
axes[0,1].set_title('Gamma Synchrony (40 Hz)')
axes[0,1].legend(fontsize=6, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')
axes[0,1].set_xlim(0, 100)

# Panel 3: Saliency map
im = axes[0,2].imshow(saliency, cmap='magma', aspect='equal')
axes[0,2].plot(target_x, target_y, 'o', color='#00ff00', markersize=10, markerfacecolor='none', lw=2)
axes[0,2].set_title('Saliency Map (target circled)')
axes[0,2].set_xlabel('X')
axes[0,2].set_ylabel('Y')

# Panel 4: Global workspace ignition
t_gw_axis = np.arange(n_steps_gw) * dt_gw
colors_gw = ['#555555', '#888888', '#d946ef', '#ec4899']
for k, I_str in enumerate(input_strengths):
    axes[1,0].plot(t_gw_axis, gw_trajectories[I_str], color=colors_gw[k], lw=1.5,
                   label='I={:.1f}'.format(I_str))
axes[1,0].axhline(0.3, color='#fbbf24', linestyle='--', alpha=0.5, label='Ignition threshold')
axes[1,0].axvspan(50, 200, alpha=0.1, color='white')
axes[1,0].set_xlabel('Time (ms)')
axes[1,0].set_ylabel('Mean workspace activity')
axes[1,0].set_title('Global Workspace Ignition')
axes[1,0].legend(fontsize=6, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')

# Panel 5: Phi for different architectures
arch_names = list(architectures.keys())
phi_vals = []
for name, W in architectures.items():
    phi, _ = compute_phi_simple(W)
    phi_vals.append(phi)
bars = axes[1,1].bar(range(len(arch_names)), phi_vals, color='#d946ef', alpha=0.8)
axes[1,1].set_xticks(range(len(arch_names)))
axes[1,1].set_xticklabels([n.replace(' ', '\n') for n in arch_names], fontsize=7)
axes[1,1].set_ylabel('Phi (integrated information)')
axes[1,1].set_title('IIT: Phi by Architecture')

# Panel 6: Attention effects comparison
att_types = ['No\nattention', 'Response\ngain', 'Contrast\ngain']
c50_vals = [np.interp(R_max/2, response_no_att, contrast*100),
            np.interp(R_max/2, response_gain, contrast*100),
            np.interp(R_max/2, response_contrast, contrast*100)]
max_resp = [np.max(response_no_att), np.max(response_gain), np.max(response_contrast)]

x_pos = np.arange(3)
bars1 = axes[1,2].bar(x_pos - 0.2, max_resp, 0.35, color='#ec4899', alpha=0.8, label='Max response')
axes[1,2].set_xticks(x_pos)
axes[1,2].set_xticklabels(att_types, fontsize=7)
axes[1,2].set_ylabel('Max firing rate (Hz)')
axes[1,2].set_title('Attention Effects Comparison')
axes[1,2].legend(fontsize=7, facecolor='#1a1a1a', edgecolor='#333', labelcolor='white')

plt.tight_layout()
plt.savefig('output.png', dpi=150, bbox_inches='tight', facecolor='#0a0a0a')
plt.close()
print()
print("[Plot saved: normalization model, gamma synchrony, saliency map,")
print(" GWT ignition, Phi comparison, and attention effects]")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Chapter Summary

• Normalization model: attention acts through gain modulation before divisive normalization, producing contrast or response gain depending on attention field size.
• Binding problem: features of the same object may be bound through gamma-band synchrony ($\sim$40 Hz) or serial attention-based integration.
• Saliency maps: bottom-up feature contrast and top-down task demands combine to guide spatial attention and eye movements.
• Global Workspace Theory: consciousness arises from all-or-none "ignition" that broadcasts information across cortical modules.
• Integrated Information Theory: consciousness is measured by $\Phi$, the integrated information generated by a system above and beyond its parts.

Share:X Reddit LinkedIn

← Decision Making Computational Models →