EMGM meeting 2023 and the hypercube

The meeting

Last month, whilst visiting collaborators and family in the UK, I attended the European Mathematical Genetics (EMGM 2023)meeting hosted by the University of Surrey.

The idea was to attend to broaden my ideas for our work on applying evolutionary methods from biology to galactic evolution.

I met many interesting people at the conference and at the pre-conference course (machine learning for geneticists with python).

I particularly enjoyed the talk given by Prof Elizabeth Thompson, a giant in the mathematical genetics field.

I was interested to discover Thompson had worked extensively on inference problems for evolutionary trees at the Cambridge Statistical Laboratory.

During her talk Prof Thompson pointed out a paper that had been receiving renewed interest, written by Kevin Donnelly titled ‘The Probability that Related Individuals Share Some Section of Genome Identical by Descent’ in the talk she briefly introduced hypercubes and presented an illustration.

I didn’t understand the concept of hypercubes at the time but the application interested me enough to find the paper later that day.

So, what is a hypercube?

A hypercube is the generalisation of a cube into more dimensions.

If you think this sounds a bit Marvel, well……it is. Our familiar Tesseract is a four dimensional hypercube. You can see the sketch of a 4 dimensional hypercube below.

What do hypercubes have to do with genetics?

Studying linkage of chromosomes in pedigree reconstruction can be complex and require significant computational power.

In Donnelly’s paper, a mathematical framework for studying linkage was presented that calculates the probability of two individuals sharing the same segment of a chromosome (identity by descent).

“This representation of the gene flow in a pedigree as a random walk on the vertices of a hypercube forms the basis of this paper. Genetic events, such as a relationship between two individuals being detectable, correspond to the random walk hitting a particular subset of vertices.” – Donnelly, 1983

Assuming no interference, Donnelly explains that the crossover processes in the pedigree may be viewed jointly as a continuous-time Markov random walk on the vertices of a hypercube, with the time parameter being map distance along the chromosome.

In this framework the states, which are the two parent chromosomes, are encoded 0 and 1.

The crossover process (i.e the moment where chromosome segments are passed on) may be considered as a continuous-time Markov random walk on these two states.

Therefore the event of two individuals having a segment of chromosome in common corresponds to the random walk hitting a particular set of vertices.

The state space is the set of possible values, for example in this case the state space would be S = {00, 01, 10, 11} (remember that the two parent chromosomes are encoded 0 and 1).

The hitting set is {00}, this ensures that all states in the state space can be reached eventually eg. starting from state 00, you can reach all other states {01, 10, 11} with positive probability in a finite number of steps.

The use of the hypercube comes into play with the symmetry of this situation, here we have a reflection in the state space for 01 and 10 (imagine a mirror between the two numbers).

If there is symmetry vertices can be partitioned into smaller vertices called orbits. This results in fewer vertices, where now you only need to keep track of the orbits, the orbits here being {00}, {01,10} and {11}.

Example from the paper illustrating the concept (note this is the first example showing a square grid) :

Later in the paper the author explores relationships with greater dimensions eg for cousin relationships where there are16 vertices of a six-dimensional hypercube.

In short, by using the geometry of hypercubes and making use of symmetry to partition the vertices into a much smaller number of orbits the computational cost of these pedigree calculations was significantly reduced.

Pretty cool huh!?

If you want to try out drawing a hypercube for fun, you can find the R code here: Hypercubes in R (getting started with programming in R)

Speaking of continuous time Markov processes, in the next post we will be introducing genetic substitution models!