Module 2 · Architecture
Equivariant Neural Network Potentials — NequIP
A neural network potential must satisfy two fundamental physical symmetries: invariance of the total energy under global rotations, reflections, and translations (the group \(E(3)\)), and equivariance of atomic forces under the same operations. Standard MLPs violate these symmetries unless enforced by data augmentation, leading to poor generalisation. NequIP (Batzner et al., 2022) builds these symmetries directly into the architecture.
2.1 Geometric Tensors and Irreducible Representations
The key idea is to represent atom-centred features not as ordinary vectors but as geometric tensors — irreducible representations of the rotation group SO(3). A tensor of angular momentum order \(\ell\) (an \(\ell\)-irrep) transforms under rotation \(R \in SO(3)\) as:
\[ \mathbf{h}^{(\ell)} \xrightarrow{R} D^{(\ell)}(R)\, \mathbf{h}^{(\ell)} \tag{2.1}\]
where \(D^{(\ell)}(R)\) is the Wigner D-matrix of order \(\ell\). The irreps are labeled by their parity \(p \in \{+1, -1\}\) and form a graded algebra under the Clebsch–Gordan product \(\otimes_{CG}\).
2.2 Message Passing with Clebsch–Gordan Products
The NequIP message from neighbour \(j\) to atom \(i\) at layer \(t\) is:
\[ m_{j \to i}^{(t)} = W\!\left(\|\mathbf{r}_{ij}\|\right) \otimes_{CG} \mathbf{h}_j^{(t)} \otimes Y^\ell(\hat{\mathbf{r}}_{ij}) \tag{2.2}\]
where \(W(\|\mathbf{r}_{ij}\|)\) is a radial network (MLP applied to Bessel basis functions of the interatomic distance), \(\mathbf{h}_j^{(t)}\) is the current feature tensor of atom \(j\), and \(Y^\ell(\hat{\mathbf{r}}_{ij})\) are spherical harmonics of the unit bond direction. The Clebsch–Gordan product couples angular-momentum channels by the triangle rule: \(|\ell_1 - \ell_2| \leq \ell_{\text{out}} \leq \ell_1 + \ell_2\).
The atom feature update aggregates over neighbours:
\[ \mathbf{h}_i^{(t+1)} = \mathbf{h}_i^{(t)} + \sigma\!\left(\sum_{j \in \mathcal{N}(i)} m_{j \to i}^{(t)}\right) \tag{2.3}\]
The energy is a sum of invariant (\(\ell = 0\)) projections of the final atom features:
\[ E(\mathbf{R}) = \sum_i \varepsilon_i\!\left(\mathbf{h}_i^{(L)}\big|_{\ell=0}\right), \quad \mathbf{F}_i = -\nabla_{\mathbf{r}_i} E \tag{2.4}\]
Forces are computed by automatic differentiation — they are equivariant by construction since \(E\) is invariant.
2.3 Data Efficiency: Why Equivariance Matters
Equivariance imposes hard constraints on the parameter space, dramatically reducing the effective dimensionality of the learning problem. Empirical comparison on a fixed test set (small organic molecules at DFT-quality):
| Model | Architecture | Training configs | Energy MAE (meV/atom) | Force MAE (meV/Å) |
|---|---|---|---|---|
| SchNet | Invariant MLP | 50,000 | 3.2 | 85 |
| DimeNet++ | Invariant (angles) | 10,000 | 2.1 | 60 |
| NequIP | Equivariant (ℓ≤2) | 1,000 | 0.8 | 15 |
| MAML-NequIP (this course) | Equivariant + meta-init | 50 | 1.5–3 | 20–35 |
Technical Subtlety
MAML inner-loop gradients must be computed through the Clebsch–Gordan layers. For exact (second-order) MAML, the Hessian of the CG product has a non-trivial tensor structure indexed by angular-momentum channels. In practice, either FOMAML or careful Hessian restriction to the radial weights (which are ordinary scalars) makes this tractable.