Parentage Analysis

Methods of Parentage Analysis in Natural Populations

Using genetic markers, estimates of genetic maternity or paternity can be achieved by excluding as parents all adults whose genotypes are incompatible with juveniles under consideration or based on difference in probability.

Distribution of larvae, By Parent Along a Section of Stream

Lets use a hypothetical example were you have sampled and genotyped 4 adult males and 4 adult females and you wish to determine the parents of the offspring sampled in the stream.  All individuals are homozygous for a unique allele which means that they transmit only 1 color ‘allele’ to their offspring.  In this example, we can easily exclude all but a single male and female as the true parents.  In reality potential parents can be either homozygous (2 copies of the same allele) or heterozygous (1 copy of each of 2 allele). With data on parentage you can determine levels of relatedness among full-siblings [sharing both parents; relatedness (rxy=0.5)], half-siblings [sharing one parent; relatedness (rxy=0.25)] and unrelated individuals [sharing no parents; relatedness (rxy=0.0)]. You can also determine the distance (dij) between related and unrelated offspring.

Parent Map Diagram

Use of Parentage Analysis to Estimate Reproductive Success

In this hypothetical example, a number of fish are genotyped at a single locus.  Individuals are scored as either ‘homozygous’ or ‘heterozygous’ based on whether they have 2 copies of the same allele (one band) or have copies of 2 different alleles (2 bands).  The putative mother is homozygous for the ‘D’ allele (DD genotype).  We wish to determine whether there are males whose genotypes are consistent with being the father of offspring 1 and 2 (O1 and O2, both with genotypes CD).  The 2 males are male 1 and male 2 who have genotypes AB and CC, respectively.  If the female is the true female then male 1 (genotype AB) can be excluded as the father.  Male 2 has 2 copies of the ‘C’ allele (he is homozygous for the ‘C’ allele and thus has a probablity of 1.0 of transmitting the ‘C’ allele to his offspring.  You can also see that there are other individuals.

Parentage Analysis Diagram

Principles of Parental Exclusion

  • Based on Mendelian rules of inheritance
  • Uses incompatibilities between parents and offspring to reject particular parent offspring combinations.


Parental Exclusion Diagram


  • Impractical if the pool of candidate parents becomes large due to the large number of loci needed to yield a single non-excluded parent.
  • Many exclusion programs can allow the user to specify the number of mismatches necessary for an exclusion to be considered valid, making the method more robust to the difficulties imposed by mutations or scoring errors.



Qualitative Inferences

Qualitative Inferences Diagram
  • Assume female F is the true mother.
  • If offspring array were produced by a single male we may predict the male’s genotype (BC).
  • If males 1-4 are the only potential males in the population and we assume a single male parent, this is evidence of gene flow.
  • If we only had samples of one offspring (e.g. 1) then our ability to infer the male(s) genotype is less likely.
  • Given the progeny array there is some probability of multiple paternity.


Descriptive Statistics Needed for Parentage Estimation

The “Hardy-Weinberg” principle in genetics states that in a large and randomly mating population, the frequency of genotypes can be estimate based on the frequency of alleles in the population.

Genotype Frequency Table

Estimating the Likelihood of Paternity Given Non-Exclusion

The probabilities of these triplets will be denoted as P(gB,gC,gD|R) where the relationship R is one of the 3 previous possibilities (UU, QU, QQ)

P(gB,gC,gD|UU) = P(gB)*P(gC)*P(gD)

P(gB,gC,gD|QU) = P(offspring gB|parent gD)*P(gC)*P(gD)

or  T(gB|gD,–)*P(gC)*P(gD)

P(gB,gC,gD|QQ) = P(offspring gB|parents gC,gD)*P(gC)*P(gD)

or  T(gB|gC,gD)*P(gC)*P(gD)

Which relationship is more likely given the data [P(R|data)] – use LOD

P(gi) is the expected frequency of the ith genotype (under Hardy-Weinberg) and
T denotes the transmission probabilities from putative parents to offspring

Lets work an example.  Here we have lovely candidate mom (C) and candidate dad (D) and a larvae (B) that were all sampled in the river.  We genotype each individual.  We have the estimates of allele frequencies for each allele, and using Hardy Weinberg principles we can estimate the expected frequency of observing each genotype in the population.  We can then estimate the likelihood of observing 3 genotypes under each of the proposed relationship categories (QQ=both potential mom C and potential dad D are the parents of larvae B) vs the alternative proposed relationship category (UU) that these three individuals are just 3 random draws from the population.  The best supporting relationship category is determined based on the ratio of the probabilities or LOD scores. Relationship category QQ is about 3.5 times more likely.

Relationship Calculation Diagram


Assignment Statistics– LOD Scores

  • Likelihood
    – T(gB|gC,gD)*P(gC)*P(gD)
    – Where T is the probability of allele transmission; and gB, gC, and gD are the genotypes of offspring individual B and candidate parents C and D
  • Likelihood ratio
    – L(H1,H2|D) = P(D|H1)/P(D|H2)
    – Where H1 is the hypothesis that the candidate parental pair is the true parental pair and H2 is the hypothesis that another candidate parental pair is the true parental pair and D denotes the data in the form of offspring and parental genotypes
  • LOD scores– (Logarithm of Odds) used in instances where there is more than one possible relationship in order to demonstrate which is more likely- LOD = loge P(D|H1)/P(D|H2)