Population Genetics

With the exception of identical twins, all humans are genetically unique. Population genetics is concerned with the investigation of the causes and consequences of this kind of genetic diversity. Structurally its causes are often mutations at single positions in the organism's genome, also known as single nucleotide polymorphisms (SNPs). In the wake of the human genome project a large number of SNPs has been mapped [9].

How many SNPs do we expect to observe in a sample of genes? This can be computed by modeling the evolution of a population of $n$ genes by drawing the next generation of $n$ genes with replacement from the previous generation. Such a scenario is known among biologists as the Wright-Fisher model of evolution. You can interactively simulate populations under this model here. Notice that each gene has a single ancestor. This contrasts with family histories, where each individual has two ancestors, a mother and a father. However, our genes also follow the single-parent genealogies described by the Wright-Fisher model. Under this model, the expected number of SNPs is given by the following simple equation [11]:

\begin{displaymath}
S=4N\mu\sum_{i=1}^{n-1}\frac{1}{i},
\end{displaymath} (1)

where N is the size of a diploid population, n the sample size, and $\mu$ the mutation rate.

In the past we have generalized equation ([*]) for alignments of arbitrary topology and looked at the distribution of SNPs in humans [4]. In addition, we have analyzed the microevolutionary implications of localized SNP patterns in the model plant Arabidopsis thaliana [2].

The analysis of A. thaliana relied, like much of modern population genetics, on a data structure known as the coalescent. This describes a random genealogy of a sample of homologous genes [7,6]. We have recently used the coalescent to explore the effect of sampling on the frequency spectrum of nucleotide polymorphisms in expanding subdivided populations [].

Bernhard Haubold 2016-04-14