10.4 Data Analysis

Modern oceanography generates massive datasets. Analysis techniques range from classical statistics to machine learning, combining observations with models to understand ocean processes.

Analysis Techniques

EOF/PCA

Empirical Orthogonal Functions. Extract dominant patterns. ENSO, PDO identified this way.

Spectral Analysis

Fourier transforms. Identify periodicities (tides, seasonal, ENSO). Power spectra.

Wavelet Analysis

Time-frequency analysis. Non-stationary signals. El Niño evolution.

Objective Analysis

Optimal interpolation. Grid sparse observations. Account for error covariance.

Data Assimilation

\( \mathbf{x}_a = \mathbf{x}_b + \mathbf{K}(\mathbf{y} - H\mathbf{x}_b) \)

Analysis = Background + Gain × Innovation (observations - model)

3D-Var / 4D-Var

Variational methods. Minimize cost function. Weather centers.

Kalman Filter

Sequential estimation. Ensemble methods. Uncertainty quantification.

Machine Learning in Oceanography

Neural Networks

SST prediction. Gap filling. Emulating physics.

Random Forests

Classification. Water mass identification. Ecosystem modeling.

Clustering

K-means, SOM. Identify ocean regimes. Biogeochemical provinces.

Deep Learning

Image analysis (plankton). Super-resolution. Physics-informed NNs.

Python: EOF Analysis

#!/usr/bin/env python3
"""data_analysis.py - EOF analysis example"""
import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg

def eof_analysis(data, n_modes=3):
    """
    Empirical Orthogonal Function analysis
    data: 2D array (time x space)
    """
    # Remove mean
    data_anom = data - data.mean(axis=0)

    # Covariance matrix
    cov = np.cov(data_anom.T)

    # Eigenvalue decomposition
    eigenvalues, eigenvectors = linalg.eigh(cov)

    # Sort by variance explained (descending)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]

    # Variance explained
    variance_explained = eigenvalues / eigenvalues.sum() * 100

    # Principal components (time series)
    pcs = data_anom @ eigenvectors

    return eigenvectors[:, :n_modes], pcs[:, :n_modes], variance_explained[:n_modes]

# Create synthetic SST data with ENSO-like pattern
np.random.seed(42)
nt, nx = 200, 50  # time, space
t = np.arange(nt)
x = np.arange(nx)

# Mode 1: ENSO-like (3-year period)
mode1_spatial = np.sin(np.pi * x / nx)
mode1_temporal = np.sin(2 * np.pi * t / 36) + 0.3 * np.random.randn(nt)

# Mode 2: Annual cycle
mode2_spatial = np.cos(2 * np.pi * x / nx)
mode2_temporal = np.sin(2 * np.pi * t / 12)

# Combine
data = np.outer(mode1_temporal, mode1_spatial) * 2 + \
       np.outer(mode2_temporal, mode2_spatial) + \
       0.5 * np.random.randn(nt, nx)

# EOF analysis
eofs, pcs, variance = eof_analysis(data)

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

axes[0, 0].plot(x, eofs[:, 0], 'b-', lw=2)
axes[0, 0].set_title(f'EOF1 ({variance[0]:.1f}%)')
axes[0, 0].set_xlabel('Space'); axes[0, 0].set_ylabel('Amplitude')

axes[0, 1].plot(t, pcs[:, 0], 'b-', lw=1)
axes[0, 1].set_title('PC1'); axes[0, 1].set_xlabel('Time')

axes[1, 0].plot(x, eofs[:, 1], 'r-', lw=2)
axes[1, 0].set_title(f'EOF2 ({variance[1]:.1f}%)')

axes[1, 1].plot(t, pcs[:, 1], 'r-', lw=1)
axes[1, 1].set_title('PC2')

plt.tight_layout()