Module 4 · Evolving Neural Networks

Neuroevolution & NEAT

Neuroevolution is the use of evolutionary algorithms to design neural networks — their weights, their topologies, or both. The classical workhorse is NEAT (Stanley & Miikkulainen 2002), which evolves both weights and topology while solving the hard problems of structural innovation and competing conventions. The modern frontier mixes neuroevolution with gradient-based training and quality-diversity methods to produce architectures that gradient descent alone cannot find.

1. Why Evolve Neural Networks?

Non-differentiable objectives. RL with sparse rewards, programs with discrete control flow, networks with non-differentiable activations.
Architecture search. Topology is a discrete search problem; gradient descent does not give topology gradients. Evolution can.
Robust local optima. Gradient descent on RL or behaviour-cloning losses falls into deceptive optima. Evolutionary search with diversity preservation escapes them (see Module 5).
Massive parallelism. Each individual is evaluated independently; neuroevolution scales linearly with cluster size, unlike batched gradient training.

2. Direct Encodings

The genome contains every synaptic weight explicitly. For a fixed-topology network with\(W\) weights the genome is just \(\mathbf{w} \in \mathbb{R}^W\)and any continuous EA (CMA-ES, NES, OpenAI ES) applies. This is the dominant approach when the topology is known a priori and only the weights are evolved (e.g. control of a fixed RL policy network). It scales surprisingly far — OpenAI ES has trained networks of \(\sim 10^6\) weights with vanilla Gaussian mutation.

The drawback: as networks grow large, the search space grows commensurately. Direct encodings are not scalable in the strong sense — the algorithm has no prior over which weights might be redundant or related.

3. NEAT

NEAT (NeuroEvolution of Augmenting Topologies) was Kenneth Stanley’s 2002 PhD thesis under Risto Miikkulainen. It solved three long-standing problems simultaneously:

Competing conventions. Two networks with identical function but different node orderings should not produce broken offspring under crossover. NEAT solves this with historical markings: every gene gets a global innovation number when it first appears, and crossover aligns genes by innovation number rather than position.
Protecting innovation. A new structural mutation initially hurts fitness because the new connection has not yet been tuned. Naive selection eliminates innovators. NEAT solves this with speciation: individuals are clustered by genetic similarity, and fitness is shared within species, giving newcomers a sheltered niche to optimise in.
Minimal starting structure. NEAT begins every run with the smallest possible network (direct input–output connections, no hidden units) and grows complexity only when fitness pressure rewards it. This minimises the dimensionality of the search and keeps speciation meaningful.

Variation operators: weight perturbation, weight randomisation, add-connection, add-node (insert a new node by splitting an existing connection). Crossover aligns parents’ gene sequences by innovation number, with disjoint and excess genes inherited from the fitter parent.

4. Indirect Encodings: HyperNEAT and CPPNs

The natural-world genome does not specify every synapse explicitly — it specifies a developmental program that generates the network. Indirect encodings imitate this. NEAT’s extension is HyperNEAT(Stanley, D’Ambrosio & Gauci 2009).

The genome is a small CPPN(compositional pattern-producing network) that takes geometric coordinates of two neurons \((x_1, y_1, x_2, y_2)\) and outputs the synapse weight between them. The CPPN’s function set includes Gaussians, sines, sigmoids — bias functions that produce repeating, symmetric, smooth patterns directly. The phenotype network can have millions of synapses while the genome stays small. Symmetries and regularities are generated, not learned.

ES-HyperNEAT (Risi & Stanley 2012) added evolved node placement, and Picbreeder (2008) showed CPPNs could be interactively evolved to produce recognisable images — an early hint of generative-model behaviour from indirect encodings.

5. Neuroevolution Meets Deep Learning

The 2017–2019 wave of work brought neuroevolution into the deep era:

Deep Neuroevolution (Such et al., Uber 2017):a vanilla GA, evolving the weights of \(\sim 4\)M-parameter ConvNets, matches deep RL on Atari. The 1990s GA still works at deep-learning scale.
Population-Based Training (Jaderberg et al., DeepMind 2017):hybridise EA-style hyperparameter search with online gradient training. Each worker does SGD; periodically, weak workers exploit the parameters of strong workers and explore new hyperparameters. Standard for large-scale RL training since.
Differentiable architecture search:DARTS, ENAS, AmoebaNet (Real et al., Google 2019). AmoebaNet was the first evolutionary architecture search to beat human-designed ImageNet architectures.

The pattern: pure gradient methods exploit fast within a fixed structure; pure evolution explores well across structures; the hybrid combines exploitation with structural innovation.

6. Practical Toolkits

NEAT-Python — the reference Python implementation.
PyTorch-NEAT — CPPN/HyperNEAT support, GPU-friendly evaluation.
EvoTorch (NNAISENSE) — CMA-ES / NES on GPU at billion-parameter scale.
evosax — JAX-based ES library; jit/vmap-friendly, ideal for batched RL.
LEAP — library for evolutionary algorithms in Python (see landing-page video).

← Module 3 Module 5 →