10.4 Data Analysis
Modern oceanography generates massive datasets. Analysis techniques range from classical statistics to machine learning, combining observations with models to understand ocean processes.
Analysis Techniques
EOF/PCA
Empirical Orthogonal Functions. Extract dominant patterns. ENSO, PDO identified this way.
Spectral Analysis
Fourier transforms. Identify periodicities (tides, seasonal, ENSO). Power spectra.
Wavelet Analysis
Time-frequency analysis. Non-stationary signals. El Niño evolution.
Objective Analysis
Optimal interpolation. Grid sparse observations. Account for error covariance.
Data Assimilation
\( \mathbf{x}_a = \mathbf{x}_b + \mathbf{K}(\mathbf{y} - H\mathbf{x}_b) \)
Analysis = Background + Gain × Innovation (observations - model)
3D-Var / 4D-Var
Variational methods. Minimize cost function. Weather centers.
Kalman Filter
Sequential estimation. Ensemble methods. Uncertainty quantification.
Machine Learning in Oceanography
Neural Networks
SST prediction. Gap filling. Emulating physics.
Random Forests
Classification. Water mass identification. Ecosystem modeling.
Clustering
K-means, SOM. Identify ocean regimes. Biogeochemical provinces.
Deep Learning
Image analysis (plankton). Super-resolution. Physics-informed NNs.
Python: EOF Analysis
#!/usr/bin/env python3
"""data_analysis.py - EOF analysis example"""
import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg
def eof_analysis(data, n_modes=3):
"""
Empirical Orthogonal Function analysis
data: 2D array (time x space)
"""
# Remove mean
data_anom = data - data.mean(axis=0)
# Covariance matrix
cov = np.cov(data_anom.T)
# Eigenvalue decomposition
eigenvalues, eigenvectors = linalg.eigh(cov)
# Sort by variance explained (descending)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# Variance explained
variance_explained = eigenvalues / eigenvalues.sum() * 100
# Principal components (time series)
pcs = data_anom @ eigenvectors
return eigenvectors[:, :n_modes], pcs[:, :n_modes], variance_explained[:n_modes]
# Create synthetic SST data with ENSO-like pattern
np.random.seed(42)
nt, nx = 200, 50 # time, space
t = np.arange(nt)
x = np.arange(nx)
# Mode 1: ENSO-like (3-year period)
mode1_spatial = np.sin(np.pi * x / nx)
mode1_temporal = np.sin(2 * np.pi * t / 36) + 0.3 * np.random.randn(nt)
# Mode 2: Annual cycle
mode2_spatial = np.cos(2 * np.pi * x / nx)
mode2_temporal = np.sin(2 * np.pi * t / 12)
# Combine
data = np.outer(mode1_temporal, mode1_spatial) * 2 + \
np.outer(mode2_temporal, mode2_spatial) + \
0.5 * np.random.randn(nt, nx)
# EOF analysis
eofs, pcs, variance = eof_analysis(data)
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes[0, 0].plot(x, eofs[:, 0], 'b-', lw=2)
axes[0, 0].set_title(f'EOF1 ({variance[0]:.1f}%)')
axes[0, 0].set_xlabel('Space'); axes[0, 0].set_ylabel('Amplitude')
axes[0, 1].plot(t, pcs[:, 0], 'b-', lw=1)
axes[0, 1].set_title('PC1'); axes[0, 1].set_xlabel('Time')
axes[1, 0].plot(x, eofs[:, 1], 'r-', lw=2)
axes[1, 0].set_title(f'EOF2 ({variance[1]:.1f}%)')
axes[1, 1].plot(t, pcs[:, 1], 'r-', lw=1)
axes[1, 1].set_title('PC2')
plt.tight_layout()