Module 3 · Evolving Programs
Genetic Programming
Genetic programming (GP) evolves programs — not parameters, not weights, but executable code. The genotype is the program itself, usually represented as an abstract syntax tree. Koza’s 1992 monograph established the field; thirty years later, GP remains the only truly general technique for discovering symbolic structure: equations, programs, controllers, circuits, neural-network architectures.
1. Tree Representation
A program is a typed tree drawn from a function set \(\mathcal{F}\) (e.g. \(\{+, -, \times, /, \sin, \cos, \exp, \log\}\)) and a terminal set \(\mathcal{T}\)(variables and constants). The tree \((+\;(\times\;x\;x)\;(\sin\;y))\) encodes the program \(x^2 + \sin y\).
Trees are usually generated by ramped half-and-half: half full trees and half grow trees, with depths uniformly distributed in \([d_\min, d_\max]\). Closure (every function works on every type its children can produce) is required for vanilla GP; strongly-typed GP (Montana 1995) relaxes closure by enforcing a type system on operators.
Variations: linear GP (programs are register-machine instruction sequences), Cartesian GP (Miller 2000 — programs are directed acyclic graphs over a fixed grid of nodes), grammatical evolution(Ryan, Collins, O’Neill 1998 — bit-strings decoded through a BNF grammar). Each addresses different problems with the tree representation.
2. Variation Operators
- Subtree crossover: pick a random subtree from each parent, swap them. The dominant operator in canonical Koza GP. With non-uniform node selection (90 % probability on internal nodes, 10 % on leaves) it strongly biases toward exchanging structure rather than constants.
- Subtree mutation: replace a randomly chosen subtree with a fresh randomly-grown subtree.
- Point mutation: replace a single node with another of the same arity (e.g. \(+\) becomes \(-\),\(\sin\) becomes \(\cos\)). Less disruptive; useful as a fine-tuning operator.
- Hoist mutation: replace the program with one of its own subtrees. Useful as a parsimony-pressure operator.
3. Bloat
The dominant failure mode of GP. Trees grow unboundedly large during evolution — often exponentially — without corresponding fitness gain. Three explanations (none singly sufficient):
- Replication accuracy: larger trees are more likely to survive crossover unchanged because the perturbation is locally smaller.
- Removal bias: the inactive (introns) part of a tree grows because editing introns has zero fitness cost, while editing active code is risky.
- Crossover bias: mathematical analysis (Poli & McPhee 2003) of the expected size distribution under standard crossover predicts a power-law drift toward large programs.
Mitigations: depth/size limits, parsimony pressure(\(\text{fitness} - \alpha\,|\text{tree}|\)), Tarpeian methods (kill the largest with high probability), operator-equalisation, multi-objective GPthat explicitly balances accuracy against size. None is uniformly best; bloat control remains an active research topic.
4. Automatically Defined Functions (ADFs)
Koza’s 1994 extension. The genome carries not a single program but a main program plus a small library of evolved subroutines that the main program may call. Each ADF has its own typed argument list and its own evolved body; the main and ADFs co-evolve. ADFs let GP discover modular abstractions, enabling problems an order of magnitude harder than flat GP can handle (parity-\(N\) for \(N \ge 8\), symbolic regression of compositional structure).
Subsequent extensions: automatically defined macros (Spector 1996), automatically defined iterations and recursions(Koza 1999) — any of which let GP build genuinely Turing-complete programs and opens the door to evolving algorithms, not just expressions.
5. Symbolic Regression
The flagship application: given input–output pairs \(\{(\mathbf{x}_i, y_i)\}\), find a closed-form expression \(\hat f(\mathbf{x})\) that minimises mean squared error. Unlike polynomial regression or neural networks, GP returns a human-readable formula. Industrial uses range from materials-science empirical laws to discovering conservation laws from raw simulation data.
The leading modern systems — PySR(Cranmer 2023) and Eureqa (Schmidt & Lipson 2009) — couple GP with a Pareto-front front-end that simultaneously optimises accuracy and simplicity, and a strong nonlinear-least-squares back-end that calibrates constants once tree topology is fixed. PySR’s combination has successfully recovered physics laws (Hamiltonians, Lagrangians) from simulation data in cosmology and fluid dynamics.
The current frontier: integrating large language models as program-proposal priors (LM-guided GP, e.g. Romera-Paredes 2024 FunSearch). LLMs propose candidate expressions, GP refines and selects.
6. Notable Real-World Successes
- NASA ST5 antenna (Lohn et al. 2006):a GP-evolved S-band antenna flew on three spacecraft. The wire shape is non-intuitive; no human engineer had proposed it.
- FX Pals analog circuits (Koza 1999):GP rediscovered patented filters, amplifiers, and a cubic root extractor.
- Quantum-circuit synthesis (Spector et al. 1999):quantum algorithms for the database-search and Deutsch–Jozsa problems evolved from scratch.
- FunSearch (DeepMind 2024): hybrid LLM + EA system rediscovered improvements to the cap-set bound and online bin-packing heuristics — the first new-mathematics result obtained via LLM + EA.
7. The Schema Theorem for Trees
Poli’s 2001 schema theorem for tree-based GP gives a closed-form prediction of expected schema-frequency change under standard crossover. It is the GP analogue of Holland’s schema theorem, derived in a more rigorous setting using the hyperschema formalism. The qualitative conclusion is the same: short, low-defining-length, high-fitness schemata are amplified. The quantitative conclusion is more useful: for the first time, runtime-analysis-style proofs of GP convergence on simple problems (XOR, multiplexer) became possible.