Module 6

Phylogenetics & Molecular Evolution

Phylogenetic inference reconstructs evolutionary trees from molecular sequences. Four principal approaches β€” distance (NJ), parsimony, maximum likelihood (IQ-TREE, RAxML), and Bayesian (BEAST2, MrBayes) β€” trade off speed, accuracy, and model sophistication. This module covers distance corrections, substitution models, model selection, and molecular-clock dating.

1. Distance Corrections

Raw p-distance underestimates true divergence because multiple substitutions can occur at the same site. Jukes-Cantor 1969 (JC69) assumes equal rates:

\[ d_{JC69} \;=\; -\tfrac{3}{4}\ln\!\Bigl(1 - \tfrac{4}{3}p\Bigr) \]

Kimura 2-parameter (K80) distinguishes transitions from transversions. HKY85 and GTR (general time-reversible) allow unequal base frequencies and different substitution rates. For protein, JTT, WAG, LG matrices serve the same role. ModelTest / ModelFinder selects the best-fit model via AIC or BIC.

2. Maximum Likelihood & Bayesian

ML phylogenetics (Felsenstein 1981) scores trees by the likelihood of the observed alignment under the substitution model, then searches tree space via stochastic perturbations (NNI, SPR, TBR). Bootstrap resampling provides confidence at each bipartition. Bayesian MCMC (MrBayes, BEAST2) samples from the posterior over trees + parameters, returning credibility intervals and clade probabilities.

Simulation: Jukes-Cantor & K2P Corrections

Python
script.py32 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

3. Molecular Clocks & Coalescent

A strict molecular clock assumes a constant rate of substitution along all branches; relaxed clocks (Thorne 1998, Drummond 2006) allow rates to vary and can be calibrated with fossil or tip dates. Coalescent theory (Kingman 1982) links demographic history to genealogy: effective population size Nedetermines the timescale of lineage coalescences. BEAST2 and similar integrate molecular clocks + coalescent + substitution models into one Bayesian inference.

4. Ancestral Sequence Reconstruction

Given an aligned tree, ancestral-sequence reconstruction (Yang 1995, PAML, HyPhy) maps the likelihood of nucleotide/AA states at internal nodes. Resurrected ancestral proteins have been experimentally validated β€” a spectacular closed-loop test of phylogenetic modelling. Liberles 2007 and Harms & Thornton 2014 review the approach.

Key References

β€’ Felsenstein, J. (1981). β€œEvolutionary trees from DNA sequences: a maximum likelihood approach.” J. Mol. Evol., 17, 368–376.

β€’ Nguyen, L. T. et al. (2015). β€œIQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.” Mol. Biol. Evol., 32, 268–274.

β€’ Bouckaert, R. et al. (2019). β€œBEAST 2.5.” PLOS Comput. Biol., 15, e1006650.

β€’ Yang, Z. (2007). β€œPAML 4: phylogenetic analysis by maximum likelihood.” Mol. Biol. Evol., 24, 1586–1591.

Share:XRedditLinkedIn