get_coalescent#

Once we have created a population by specifying its transmission history, it is very simple to perform a Markov chain simulation of coalescence times by invoking get_coalescent.

!pip install coalestr
from coalestr import cs
# Specify the transmission history of a population
# Each inner list gives [duration, N, Q, X, M] for a period of history
my_history = [[1000, 30, 10, 0, 0], [1000, 10, 5, 0.5, 0]]

# Create the population
my_population = cs.Population(my_history)

# Obtain coalescence times
my_population.get_coalescent()
Observation time.    Events captured.   Mean coalescence time
                      beho      wiho        beho     wiho
        0             100.0     100.0        25.1     18.9

Mean coalescence time

The ‘beho’ or between-host column on the right tells us that two alleles sampled from different hosts have a mean coalescence time of 25.1 generations.

The ‘wiho’ or within-host column on the right tells us that two alleles sampled from the same host have a mean coalescence time of 18.9 generations.

Events captured

The ‘events captured’ columns tell us that 100% of coalescent events are captured for both the between-host and the within-host simulation. This is an important QC check because we want to be sure that our simulation runs for long enough to capture the vast majority of coalescent events.

Let us see what happens if we drastically reduce the duration of our population history.

my_history = [[30, 30, 10, 0, 0], [10, 10, 5, 0.5, 0]]

my_population = cs.Population(my_history)

my_population.get_coalescent()
Observation time.    Events captured.   Mean coalescence time
                      beho      wiho        beho     wiho
        0              68.6      77.2        10.7      8.4

Here we appear to have shorter coalescence times but the results are unreliable because we are capturing only 68.6% of coalescent events for the between-host simulation and 77.2% of coalescent events for the within-host simulation.

Observation times#

In the above examples we made a single observation of the population at observation time = 0. This means that we sampled two alleles at time = 0 and followed their lineages back in time until they coalesced. Since we are proceeding back in time, observation time 0 means the present, or the most recent generation in the population history.

If we want to see how the genetic diversity of the population changes over time, then we need to make observations at multiple time points. For example, we might make observations every 500 generations. We can do this by providing get_coalescent with a list of observation times, always remembering that we are counting in backwards time, so observation time = 500 means 500 generations before the present.

my_history = [[1000, 30, 10, 0, 0], [1000, 10, 5, 0.5, 0]]

my_observations = [0, 500, 1000, 1500]

my_population = cs.Population(my_history)

my_population.get_coalescent(observe = my_observations)
Observation time.    Events captured.   Mean coalescence time
                      beho      wiho        beho     wiho
        0             100.0     100.0        25.1     18.9
      500             100.0     100.0        25.1     18.9
     1000             100.0     100.0        39.0     10.0
     1500             100.0     100.0        39.0     10.0
# We can also give a range of observation times like this

my_observations = range(0, 2000, 500)

my_population.get_coalescent(observe = my_observations)
Observation time.    Events captured.   Mean coalescence time
                      beho      wiho        beho     wiho
        0             100.0     100.0        25.1     18.9
      500             100.0     100.0        25.1     18.9
     1000             100.0     100.0        39.0     10.0
     1500             100.0     100.0        39.0     10.0
# We can retrieve a list of observation times as a population attribute

my_population.observation_times
[0, 500, 1000, 1500]

Data objects#

get_coalescent generates various data objects that are used by get_diversity and other methods. For most purposes we can ignore these data objects, and simply focus on the reports that are automatically produced when we run get_coalescent or get_diversity. However if we want to take a deep dive into the results of the Markov chain simulation, we can retrieve these data objects as attributes of our population:

  • t_sim

  • report

  • beho_lineage

  • wiho_lineage

  • coalescent

t_sim gives the duration of the simulation. For a simple population without migration, t_sim = t_his which is the duration of the population history. For a subpopulation that is embedded in a metapopulation, where the population history of the metapopulation may be greater than that of the subpopulation, t_sim gives the duration of the metapopulation history.

report gives a summary of coalescence events for different observation times.

  • axis 0: observation_times[i]

  • axis 1: summary of coalescence events

    • 0: observation time

    • 1: beho: percentage of coalescent events captured

    • 2: wiho: percentage of coalescent events captured

    • 3: beho: expectation of coalescence time

    • 4: wiho: expectation of coalescence_time

    • 5: eco setpoint: proportion of coalescences to capture

    • 6: break time

Here is an example of how we can use report.

for i in range(len(my_population.observation_times)):
    
    print("At observation time {0:1.0f} the mean time to coalescence was\
    \n   {1:1.1f} for a between-host sample ({2:1.0f}% events captured)\
    \n   {3:1.1f} for a within-host sample ({4:1.0f}% events captured)"
        .format(
        my_population.report[i, 0],
        my_population.report[i, 3],
        my_population.report[i, 1],
        my_population.report[i, 4],
        my_population.report[i, 2]))
At observation time 0 the mean time to coalescence was    
   25.1 for a between-host sample (100% events captured)    
   18.9 for a within-host sample (100% events captured)
At observation time 500 the mean time to coalescence was    
   25.1 for a between-host sample (100% events captured)    
   18.9 for a within-host sample (100% events captured)
At observation time 1000 the mean time to coalescence was    
   39.0 for a between-host sample (100% events captured)    
   10.0 for a within-host sample (100% events captured)
At observation time 1500 the mean time to coalescence was    
   39.0 for a between-host sample (100% events captured)    
   10.0 for a within-host sample (100% events captured)

beho_lineage describes variation over time in the state of two lineages sampled from different hosts at observation time 0. This will allow us to estimate between-host levels of genetic diversity.

wiho_lineage describes variation over time in the state of two lineages sampled from the same host at observation time 0. This will allow us to estimate within-host levels of genetic diversity.

beho_lineage and wiho_lineage are both arrays with the following structure:

  • axis 0: backwards time

  • axis 1:

    • 0: probability that lineages are separated

    • 1: probability that lineages are cotransmitted

    • 2: probability that lineages are coalesced

    • 3: blank

    • 4: blank

    • 5: probability of coalescence occuring at this timepoint

    • 6: summation series for expectation of coalescence time

Let us see how beho_lineage and wiho_lineage can be used to plot the state of two lineages going back in time for my_population created above.

import matplotlib.pyplot as plt

b_separated = my_population.beho_lineage[:, 0]
b_cotransmitted = my_population.beho_lineage[:, 1]
b_coalesced = my_population.beho_lineage[:, 2]

w_separated = my_population.wiho_lineage[:, 0]
w_cotransmitted = my_population.wiho_lineage[:, 1]
w_coalesced = my_population.wiho_lineage[:, 2]

timescale = range(1, my_population.t_his + 1)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14,5), sharex = True, sharey = True)

ax1.plot(timescale, b_separated, marker='', color='blue', linewidth = 2, label="Separated")
ax1.plot(timescale, b_cotransmitted, marker='', color='green', linewidth = 2, label="Cotransmitted")
ax1.plot(timescale, b_coalesced, marker='', color='red', linewidth = 2, label="Coalesced")

ax2.plot(timescale, w_separated, marker='', color='blue', linewidth = 2, label="Separated")
ax2.plot(timescale, w_cotransmitted, marker='', color='green', linewidth = 2, label="Cotransmitted")
ax2.plot(timescale, w_coalesced, marker='', color='red', linewidth = 2, label="Coalesced")

ax1.legend(title="BETWEEN-HOST",frameon=True, fontsize=12)
ax2.legend(title="WITHIN-HOST",frameon=True, fontsize=12)
ax1.set_xlabel("Generations before sample (offset +1)", fontsize=12)
ax2.set_xlabel("Generations before sample (offset +1)", fontsize=12)
ax1.set_ylabel("Probability", fontsize=12)
ax1.set_xscale('log',base=10)
ax1.set_xlim(1,my_population.t_his)
ax1.grid(visible=True, which='both', color='0.65', linestyle='-')
ax2.grid(visible=True, which='both', color='0.65', linestyle='-')

plt.show()
_images/a4b6bddad102b61b6e083b79e7fca88962f721e200696aedac7ca92a64e76a43.png

The population attribute coalescent describes the probability distribution of time to coalescence at different observation times. Recall that the population attribute observation_times is a list of observation times.

  • axis 0: observation_times[i]

  • axis 1: backwards time

  • axis 2: probability distribution of time to coalescence

    • 0: beho: probability of coalescence occurring at this timepoint

    • 1: wiho: probability of coalescence occurring at this timepoint

Let us see how this can be used to plot the probability distribution of coalescence times for observation time = 500.

obs_time = 500

i = my_population.observation_times.index(obs_time)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,4), sharex = True)

timescale = range(1, my_population.t_his - obs_time)
beho_coalescent = my_population.coalescent[i, obs_time + 1:, 0]
wiho_coalescent = my_population.coalescent[i, obs_time + 1:, 1]

ax1.plot(timescale, beho_coalescent, marker='', linewidth=2)
    
ax2.plot(timescale, wiho_coalescent, marker='', linewidth=2)
    
ax1.set_title("Between-host", position=(0.8, 0.3), fontsize=14)
ax2.set_title("Within-host", position=(0.8, 0.3), fontsize=14)
ax1.set_xlabel("Time to coalescence", fontsize=12)
ax2.set_xlabel("Time to coalescence", fontsize=12)
ax1.set_ylabel("Probability", fontsize=12)
ax1.set_xscale('log',base=10)

plt.show()
_images/c3afa3cc1e835b69330f510503da393490cb294cb590bea8bcff9a0414ef902e.png