How many SNPs do we expect to observe in a sample of genes? This can
be computed by modeling the evolution of a population of diploid
individuals with
genes by drawing the next generation of
genes with replacement from the previous generation. Such a scenario
is known among biologists as the Wright-Fisher model of evolution. You
can interactively simulate populations under this model
here. Notice that each gene
has a single ancestor. This contrasts with family histories, where
each individual has two ancestors,
a mother and a father. However, our genes also follow the
single-parent genealogies described by the Wright-Fisher model. Under
this model, the expected number of SNPs is given by the following
simple equation [13]:
In the past we have generalized equation (1) for alignments of arbitrary topology and looked at the distribution of SNPs in humans [5]. In addition, we have analyzed the microevolutionary implications of localized SNP patterns in the model plant Arabidopsis thaliana [4].
The analysis of A. thaliana relied, like much of modern population genetics, on a data structure known as the coalescent. This describes a random genealogy of a sample of homologous genes [7,6]. We have used the coalescent to explore the effect of sampling on the frequency spectrum of nucleotide polymorphisms in expanding subdivided populations [10] and to quantify historical population size changes [9].