ML for Molecular Dynamics
Neural network potentials, equivariant architectures, message passing, and ML-accelerated simulations
The Speed-Accuracy Trade-off in MD
Molecular dynamics (MD) simulations propagate Newton's equations of motion for a system of atoms. The computational bottleneck is evaluating the potential energy surface (PES):ab initio methods (DFT, coupled cluster) are accurate but scale as$\mathcal{O}(N^3)$ to $\mathcal{O}(N^7)$, limiting simulations to hundreds of atoms for nanoseconds. Classical force fields are fast ($\mathcal{O}(N)$) but lack the accuracy to capture bond breaking, charge transfer, or reactive chemistry.
Machine-learned interatomic potentials (MLIPs) bridge this gap: they are trained onab initio data but evaluate at near-force-field speed, enabling million-atom simulations with quantum-mechanical accuracy.
1. Molecular Dynamics Fundamentals
In MD, we integrate Newton's equations for $N$ atoms with positions$\{r_1, \ldots, r_N\}$ and a potential energy function $E(\{r_i\})$:
Equations of Motion
$$m_i \ddot{r}_i = F_i = -\nabla_{r_i} E(\{r_j\})$$
The force $F_i$ on atom $i$ is the negative gradient of the total energy with respect to its position. The Velocity Verlet integrator updates positions and velocities with time step $\Delta t$:
$$r_i(t + \Delta t) = r_i(t) + v_i(t)\Delta t + \frac{F_i(t)}{2m_i}\Delta t^2$$
$$v_i(t + \Delta t) = v_i(t) + \frac{F_i(t) + F_i(t+\Delta t)}{2m_i}\Delta t$$
The ML challenge is: given a dataset of atomic configurations and their energies/forces from quantum calculations, learn a function $\hat{E}_\theta(\{r_i, Z_i\})$ that predicts the energy (and forces via automatic differentiation) with near-DFT accuracy.
2. Symmetry Requirements for MLIPs
Physical potentials must respect fundamental symmetries. Any valid MLIP must be:
1. Invariant under translations:
$$E(\{r_i + t\}) = E(\{r_i\}) \quad \forall\, t \in \mathbb{R}^3$$
2. Invariant under rotations and reflections (O(3)):
$$E(\{R r_i\}) = E(\{r_i\}) \quad \forall\, R \in O(3)$$
3. Invariant under permutations of like atoms:
$$E(\{r_{\pi(i)}\}) = E(\{r_i\}) \quad \forall\, \pi \in S_N$$
4. Extensive (linear in system size):
$$E = \sum_{i=1}^{N} \varepsilon_i \quad \text{(decomposable into per-atom contributions)}$$
Forces must transform as vectors (equivariant under O(3)):$F_i(\{Rr_j\}) = R\, F_i(\{r_j\})$. This is automatic when forces are computed as $F_i = -\nabla_{r_i} E$, since the gradient of an invariant scalar is equivariant.
3. Neural Network Potentials (Behler-Parrinello)
The seminal approach of Behler & Parrinello (2007) decomposes the total energy into per-atom contributions, each computed by a feed-forward neural network acting on handcrafted symmetry functions (descriptors):
Atom-Centred Symmetry Functions
Radial symmetry functions (two-body):
$$G_i^{\text{rad}} = \sum_{j \neq i} e^{-\eta(r_{ij} - R_s)^2} f_c(r_{ij})$$
Angular symmetry functions (three-body):
$$G_i^{\text{ang}} = 2^{1-\zeta} \sum_{j,k \neq i} (1 + \lambda \cos\theta_{ijk})^\zeta \, e^{-\eta(r_{ij}^2 + r_{ik}^2 + r_{jk}^2)} f_c(r_{ij}) f_c(r_{ik}) f_c(r_{jk})$$
The cutoff function ensures smooth decay:
$$f_c(r) = \begin{cases} \frac{1}{2}\left[\cos\left(\frac{\pi r}{r_c}\right) + 1\right] & r \leq r_c \\ 0 & r > r_c \end{cases}$$
The descriptor vector $\mathbf{G}_i = (G_i^{(1)}, G_i^{(2)}, \ldots)$ is by construction invariant under rotation, translation, and permutation. The total energy is:
$$E = \sum_{i=1}^{N} \text{NN}_{Z_i}(\mathbf{G}_i)$$
Click Run to execute the Python code
Code will be executed with Python 3 on the server
4. Gaussian Approximation Potentials (GAP) & SOAP
GAP (Bartók et al., 2010) uses Gaussian process regression with the SOAP (Smooth Overlap of Atomic Positions) kernel to learn the PES.
SOAP Descriptor
The atomic neighbour density around atom $i$ is expanded in a basis of radial functions and spherical harmonics:
$$\rho_i(\mathbf{r}) = \sum_{j \in \text{neigh}(i)} g(|\mathbf{r} - \mathbf{r}_{ij}|) = \sum_{nlm} c_{nlm}^{(i)} R_n(r) Y_l^m(\hat{r})$$
The SOAP power spectrum (rotationally invariant) is:
$$p_{nn'l}^{(i)} = \pi\sqrt{\frac{8}{2l+1}} \sum_{m=-l}^{l} (c_{nlm}^{(i)})^* c_{n'lm}^{(i)}$$
The SOAP kernel between two environments is$k(i, j) = |\mathbf{p}^{(i)} \cdot \mathbf{p}^{(j)}|^\zeta / (|\mathbf{p}^{(i)}|^\zeta |\mathbf{p}^{(j)}|^\zeta)$, where $\zeta$ controls sensitivity.
GAP Prediction
Given training data $\{(\mathbf{p}^{(i)}, \varepsilon_i)\}$, the GAP prediction is:
$$\hat{\varepsilon}(\mathbf{p}^*) = \sum_{i=1}^{M} \alpha_i k(\mathbf{p}^*, \mathbf{p}^{(i)})$$
where $\alpha = (K + \sigma^2 I)^{-1} \varepsilon$ and$K_{ij} = k(\mathbf{p}^{(i)}, \mathbf{p}^{(j)})$. The GP also provides uncertainty estimates $\sigma^2(\mathbf{p}^*) = k(\mathbf{p}^*, \mathbf{p}^*) - \mathbf{k}^\top (K + \sigma_n^2 I)^{-1} \mathbf{k}$.
5. Message Passing Neural Networks
Instead of handcrafted descriptors, message-passing neural networks (MPNNs) learn the atomic environment representation end-to-end through iterative message passing on the molecular graph.
MPNN Framework (Gilmer et al., 2017)
Each atom $i$ has a feature vector $h_i^{(0)}$ (initialised from element type). At each message-passing layer $l$:
$$m_i^{(l)} = \sum_{j \in \mathcal{N}(i)} M^{(l)}(h_i^{(l)}, h_j^{(l)}, e_{ij}) \quad \text{(aggregate messages)}$$
$$h_i^{(l+1)} = U^{(l)}(h_i^{(l)}, m_i^{(l)}) \quad \text{(update features)}$$
After $L$ layers, the per-atom energy is:$\varepsilon_i = R(h_i^{(L)})$ where $R$ is a readout network. The total energy $E = \sum_i \varepsilon_i$ is permutation-invariant by construction (sum over atoms).
Click Run to execute the Python code
Code will be executed with Python 3 on the server
6. SchNet: Continuous-Filter Convolutions
SchNet Architecture (Schütt et al., 2017)
SchNet replaces discrete graph convolutions with continuous-filter convolutions that operate directly on interatomic distances. The key innovation is the filter-generating network:
$$W(r_{ij}) = \text{MLP}(\text{RBF}(r_{ij})) \in \mathbb{R}^{F}$$
where the RBF expansion uses Gaussian basis functions centred at different distances:
$$e_k(r_{ij}) = \exp\left(-\gamma_k (r_{ij} - \mu_k)^2\right), \quad k = 1, \ldots, K$$
The interaction update for atom $i$ in layer $l$ is:
$$h_i^{(l+1)} = h_i^{(l)} + \sum_{j \in \mathcal{N}(i)} h_j^{(l)} \odot W^{(l)}(r_{ij})$$
By depending only on scalar distances $r_{ij}$, SchNet is automatically invariant under translations and rotations. However, it cannot represent directional (tensorial) properties.
7. Equivariant Neural Networks
While SchNet achieves invariance by working only with scalar distances, more expressive architectures encode equivariance: features transform predictably under rotations, enabling the network to represent vectorial and tensorial quantities.
Equivariance Under O(3)
A function $f$ mapping between feature spaces is equivariant under a group $G$ if:
$$f(D_{\text{in}}(g) \cdot x) = D_{\text{out}}(g) \cdot f(x) \quad \forall\, g \in G$$
where $D_{\text{in}}, D_{\text{out}}$ are representations of $G$. For O(3), features are decomposed into irreducible representations (irreps) labelled by angular momentum $l$:
- $l = 0$: scalars (1D, invariant) — energies, charges
- $l = 1$: vectors (3D) — forces, dipole moments
- $l = 2$: rank-2 traceless symmetric tensors (5D) — polarisability, stress
Under rotation $R$, an $l$-type feature transforms as$h^{(l)} \to D^{(l)}(R)\, h^{(l)}$ where $D^{(l)}$ is the$(2l+1) \times (2l+1)$ Wigner D-matrix.
Tensor Product in Equivariant Layers
The key operation is the tensor product of irreps, governed by Clebsch-Gordan coefficients:
$$(h^{(l_1)} \otimes h^{(l_2)})_m^{(l_3)} = \sum_{m_1, m_2} C_{l_1 m_1, l_2 m_2}^{l_3 m} h_{m_1}^{(l_1)} h_{m_2}^{(l_2)}$$
where $|l_1 - l_2| \leq l_3 \leq l_1 + l_2$ (triangle inequality). This allows combining features of different angular momenta while preserving equivariance. For example, multiplying two $l=1$ vectors produces $l=0$ (scalar: dot product),$l=1$ (vector: cross product), and $l=2$ (traceless symmetric tensor).
Click Run to execute the Python code
Code will be executed with Python 3 on the server
8. MACE: Multi-ACE Architecture
Atomic Cluster Expansion (ACE)
MACE (Batatia et al., 2022) builds on the Atomic Cluster Expansion framework, which systematically constructs many-body invariant descriptors. The key idea is the body-ordered expansion:
$$\varepsilon_i = \varepsilon_i^{(1)} + \varepsilon_i^{(2)} + \varepsilon_i^{(3)} + \cdots$$
where $\varepsilon_i^{(\nu)}$ is the $\nu$-body contribution. MACE achieves high body order efficiently by using symmetric contractions of equivariant message-passing features:
$$A_{i,zlm}^{(t)} = \sum_{j \in \mathcal{N}(i)} R_{nl}^{(t)}(r_{ij})\, Y_l^m(\hat{r}_{ij})\, h_{j,z}^{(t)}$$
The key advantage of MACE over earlier MPNNs is that each message-passing layer effectively increases the body order by one, so with just 2 layers, MACE captures 4-body interactions, sufficient for most chemical systems.
9. ML-Based Coarse-Graining
Coarse-graining (CG) reduces the number of degrees of freedom by grouping atoms into "beads." ML can learn the effective CG potential from atomistic simulations.
Force-Matching (Multiscale Coarse-Graining)
Given a mapping operator $M$ from atomistic to CG coordinates$R_I = M_I(\{r_i\})$, the CG force on bead $I$ is defined as the average atomistic force projected onto the CG space:
$$F_I^{\text{CG}} = \sum_{i \in \text{group}(I)} F_i^{\text{atom}}$$
The ML CG potential is trained by minimising:
$$\mathcal{L} = \sum_{I=1}^{N_{\text{CG}}} \left\|F_I^{\text{CG}} - \left(-\nabla_{R_I} U_\theta^{\text{CG}}\right)\right\|^2$$
This force-matching approach is variational: the optimal CG potential minimises the relative entropy $S_{\text{rel}} = \int p_{\text{CG}}^{\text{atom}} \ln(p_{\text{CG}}^{\text{atom}} / p_\theta^{\text{CG}}) dR$between the atomistic CG distribution and the model CG distribution.
Click Run to execute the Python code
Code will be executed with Python 3 on the server
10. ML-Enhanced Sampling
Collective Variables from Autoencoders
Enhanced sampling methods (metadynamics, umbrella sampling) requirecollective variables (CVs) that capture the slow degrees of freedom. Autoencoders can learn CVs directly from MD trajectories:
$$\xi(x) = f_\phi(x) \in \mathbb{R}^d, \quad d = 1, 2$$
The encoder latent space defines a low-dimensional reaction coordinate. This can be combined with metadynamics by adding a bias potential $V_{\text{bias}}(\xi)$ in the learned CV space to accelerate sampling of rare events.
Recent approaches use time-lagged autoencoders (TAE) that maximise the autocorrelation of the latent variables, preferentially selecting slow modes:$\mathcal{L}_{\text{TAE}} = \mathcal{L}_{\text{recon}} - \lambda \sum_k \text{Corr}(z_k(t), z_k(t+\tau))$
Boltzmann Generators
Normalising flows trained to sample from the Boltzmann distribution$p(x) \propto e^{-\beta E(x)}$ by transforming a simple prior through invertible neural networks. This allows direct sampling of equilibrium configurations without running MD.
Active Learning for MLIPs
During MD with an MLIP, configurations where the model is uncertain (e.g., high GP variance or disagreement between ensemble members) are selected for expensiveab initio recalculation. The MLIP is then retrained, creating a self-improving simulation loop.
Chapter Summary
- • MLIPs must respect translational, rotational, and permutational symmetries, plus extensivity.
- • Behler-Parrinello NNPs use handcrafted symmetry functions as invariant descriptors for per-atom neural networks.
- • GAP/SOAP uses Gaussian process regression with rotationally invariant power spectrum descriptors.
- • Message-passing NNs (SchNet, DimeNet, PaiNN) learn representations end-to-end through iterative neighbourhood aggregation.
- • Equivariant networks use irreducible representations of O(3) and tensor products (Clebsch-Gordan coefficients) to build expressive architectures.
- • MACE achieves high body order efficiently through symmetric contractions of equivariant features.
- • Coarse-graining via force-matching learns effective potentials for reduced representations.
- • Enhanced sampling benefits from ML-learned collective variables, Boltzmann generators, and active learning strategies.