Imagine tracing a tree diagram backwards in time, from the tips all the way back to it’s root.
At each node the lineages coalesce, or merge, into one.
As we follow the nodes eventually all lineages combine into one ancestral lineage, representing the most recent common ancestor (MRCA).
In 1982 Kingman showed that this process of merging lineages backwards in time can be described mathematically as a stochastic process termed the n-coalescent.
Kingman, Hudson, Tajima and many more have contributed and expanded on coalescent theory and applied it to the study of genealogies and population history.
Today, coalescent models and trees are widely used in population genetics and phylogenetics to infer population demographic changes, reconstruct evolutionary histories and study genetic diversity across populations.
In this post we will keep it simple and focus on coalescent theory for haploid individuals.
Bugs in a box
Joe Felsenstein introduced the analogy of ‘bugs in a box‘ to aid visualising the n-coalescent.
Imagine a box full of hyperactive bugs. We place k bugs in the box, which move about randomly.
Occasionally two bugs collide, at which point one instantly eats the other.
Over time, the number of bugs decreases from k to k-1, k-2, and so on, until only a single bug remains.
In this analogy collisions represent coalescent events where the probability of a collision is determined by the density of bugs (the number of pairs of bugs k(k-1)/2) and the size of the box (Ne).
Coalescent trees
The n-coalescent can be represented as a genealogical coalescent tree.
In a coalescent tree, nodes represent coalescent events, the point where lineages merge and branch lengths correspond to waiting times between these coalescent events.
The waiting time(T) is determined by the number of lineages(k) and the effective population size (Ne), where Ne is the number of individuals making a genetic contribution to the population.
Coalescent trees encode information about population history, or ‘population demographics’ through characteristic tree shapes.
The n-coalescent captures ancestry in a probabilistic way, think back to our ‘bugs in a box’ example.
When there are more lineages (k) there are shorter waiting times as the probability of coalescent events is higher.
If there is a large Ne, the probability of coalescence events is lower, resulting in longer waiting times.
Expected waiting times
If we know the number of lineages (k) and the effective population size(Ne) we can calculate the expected waiting time E[Tk] for each coalescent event.
The expected waiting time E[Tk] differs from the waiting time T mentioned prior, as branch lengths can fluctuate randomly.
Using an expected value allows us to scale the waiting time relative to Ne to provide a typical branch length.
We can calculate the expected waiting time(E) using the following equation:
E[Tk] = 2Ne/k(k-1)
In each coalescent event exactly two lineages are merged, with k lineages there are k(k-1)/2 possible pairs.
Each pair coalesces at a rate of 1/Ne
As any pair can coalesce the total rate of a coalescent event is the sum over all pairs.
The expected waiting time until the next coalescent event is the inverse of this total rate, where the total rate can be calculated as the number of possible pairs multiplied by the rate per pair:
k(k-1)/2 x 1/Ne = k(k-1)/2Ne
Once inverted, we get our expected waiting time for each coalescent event E[Tk] = 2Ne/k(k-1)
These expected waiting times will become our branch lengths in the coalescent tree, ultimately influencing tree shape.
Why the coalescent is useful
By analysing the branch lengths (waiting times) and the number and distribution of nodes in a coalescent tree, researchers can estimate effective population sizes, detect bottlenecks or expansions and model changes in population structure over time.
What makes the ‘backwards in time’ coalescent approach so powerful is that it doesn’t require a large number of gene copies where even a small sample can provide reliable inferences for the entire population.
Insights from coalescent theory form the foundation of many modern population genetics and phylogenetics analyses, from estimating mutation rates to reconstructing population histories.