Module 2 · Architecture

Equivariant Neural Network Potentials — NequIP

A neural network potential must satisfy two fundamental physical symmetries: invariance of the total energy under global rotations, reflections, and translations (the group \(E(3)\)), and equivariance of atomic forces under the same operations. Standard MLPs violate these symmetries unless enforced by data augmentation, leading to poor generalisation. NequIP (Batzner et al., 2022) builds these symmetries directly into the architecture.

2.1 Geometric Tensors and Irreducible Representations

The key idea is to represent atom-centred features not as ordinary vectors but as geometric tensors — irreducible representations of the rotation group SO(3). A tensor of angular momentum order \(\ell\) (an \(\ell\)-irrep) transforms under rotation \(R \in SO(3)\) as:

\[ \mathbf{h}^{(\ell)} \xrightarrow{R} D^{(\ell)}(R)\, \mathbf{h}^{(\ell)} \tag{2.1}\]

where \(D^{(\ell)}(R)\) is the Wigner D-matrix of order \(\ell\). The irreps are labeled by their parity \(p \in \{+1, -1\}\) and form a graded algebra under the Clebsch–Gordan product \(\otimes_{CG}\).

NequIP feature hierarchy: atomic features live in graded spaces of SO(3) irreducible representations. Message passing combines atom features via the Clebsch–Gordan product with spherical-harmonic encodings of bond directions. The energy is an invariant (\(\ell=0\)) contraction of the final atom features.

2.2 Message Passing with Clebsch–Gordan Products

The NequIP message from neighbour \(j\) to atom \(i\) at layer \(t\) is:

\[ m_{j \to i}^{(t)} = W\!\left(\|\mathbf{r}_{ij}\|\right) \otimes_{CG} \mathbf{h}_j^{(t)} \otimes Y^\ell(\hat{\mathbf{r}}_{ij}) \tag{2.2}\]

where \(W(\|\mathbf{r}_{ij}\|)\) is a radial network (MLP applied to Bessel basis functions of the interatomic distance), \(\mathbf{h}_j^{(t)}\) is the current feature tensor of atom \(j\), and \(Y^\ell(\hat{\mathbf{r}}_{ij})\) are spherical harmonics of the unit bond direction. The Clebsch–Gordan product couples angular-momentum channels by the triangle rule: \(|\ell_1 - \ell_2| \leq \ell_{\text{out}} \leq \ell_1 + \ell_2\).

The atom feature update aggregates over neighbours:

\[ \mathbf{h}_i^{(t+1)} = \mathbf{h}_i^{(t)} + \sigma\!\left(\sum_{j \in \mathcal{N}(i)} m_{j \to i}^{(t)}\right) \tag{2.3}\]

The energy is a sum of invariant (\(\ell = 0\)) projections of the final atom features:

\[ E(\mathbf{R}) = \sum_i \varepsilon_i\!\left(\mathbf{h}_i^{(L)}\big|_{\ell=0}\right), \quad \mathbf{F}_i = -\nabla_{\mathbf{r}_i} E \tag{2.4}\]

Forces are computed by automatic differentiation — they are equivariant by construction since \(E\) is invariant.

2.3 Data Efficiency: Why Equivariance Matters

Equivariance imposes hard constraints on the parameter space, dramatically reducing the effective dimensionality of the learning problem. Empirical comparison on a fixed test set (small organic molecules at DFT-quality):

Model	Architecture	Training configs	Energy MAE (meV/atom)	Force MAE (meV/Å)
SchNet	Invariant MLP	50,000	3.2	85
DimeNet++	Invariant (angles)	10,000	2.1	60
NequIP	Equivariant (ℓ≤2)	1,000	0.8	15
MAML-NequIP (this course)	Equivariant + meta-init	50	1.5–3	20–35

Technical Subtlety

MAML inner-loop gradients must be computed through the Clebsch–Gordan layers. For exact (second-order) MAML, the Hessian of the CG product has a non-trivial tensor structure indexed by angular-momentum channels. In practice, either FOMAML or careful Hessian restriction to the radial weights (which are ordinary scalars) makes this tractable.

← Module 1 Module 3 →