Constrained Single-Trait Phenotypic Selection#

In the Single-Trait Phenotypic Selection example, we conducted a simulation where we selected the top individuals based on their estimated breeding values. Unfortunately, this “greedy” selection strategy quickly depletes genetic diversity, leading to poorer genetic gains in the long run. One method to combat the depletion of genetic diversity is to add an inbreeding constraint. Instead of selecting the top individuals, we select the top individuals given that their mean relatedness does not exceed a specific value. In the example below, we perform both constrained and unconstrained selection and compare their long-term effects using simulations.

Simulation Preliminaries#

Loading Required Modules and Seeding the global PRNG#

To begin, we import the various modules we will be used into the Python namespace. We also set the seed for our simulation so that we can replicate the results at a later time.

# import libraries
from numbers import Real
import numpy
import pandas
import pybrops
from matplotlib import pyplot
from pybrops.breed.prot.bv.MeanPhenotypicBreedingValue import MeanPhenotypicBreedingValue
from pybrops.breed.prot.mate.TwoWayCross import TwoWayCross
from pybrops.breed.prot.mate.TwoWayDHCross import TwoWayDHCross
from pybrops.breed.prot.pt.G_E_Phenotyping import G_E_Phenotyping
from pybrops.breed.prot.sel.EstimatedBreedingValueSelection import EstimatedBreedingValueSubsetSelection
from pybrops.breed.prot.sel.OptimalContributionSelection import OptimalContributionSubsetSelection
from pybrops.model.gmod.DenseAdditiveLinearGenomicModel import DenseAdditiveLinearGenomicModel
from pybrops.opt.algo.SortingSubsetOptimizationAlgorithm import SortingSubsetOptimizationAlgorithm
from pybrops.opt.algo.SteepestDescentSubsetHillClimber import SteepestDescentSubsetHillClimber
from pybrops.popgen.bvmat.DenseBreedingValueMatrix import DenseBreedingValueMatrix
from pybrops.popgen.cmat.fcty.DenseMolecularCoancestryMatrixFactory import DenseMolecularCoancestryMatrixFactory
from pybrops.popgen.gmap.HaldaneMapFunction import HaldaneMapFunction
from pybrops.popgen.gmap.StandardGeneticMap import StandardGeneticMap
from pybrops.popgen.gmat.DensePhasedGenotypeMatrix import DensePhasedGenotypeMatrix

# seed python random and numpy random
pybrops.core.random.prng.seed(52347529)

Simulation Parameters#

Next, we define a couple of simulation parameter constants.

nfndr = 40      # number of founder individuals
nqtl = 1000     # number of QTL
qlen = 6        # length of the queue
ncross = 20     # number of cross configurations
nparent = 2     # number of parents per cross configuration
nmating = 1     # number of times to perform cross configuration
nprogeny = 80   # number of progenies per cross attempt
nrandmate = 20  # number of random intermatings
nsimul = 60     # number of simulation generations

Loading Genetic Map Data from a Text File#

Then, we load genetic map data from a CSV-like file. In this example, we are using the US NAM genetic map constructed by McMullen et al. in 2009. The code below demonstrates how these data are read into a StandardGeneticMap object using the from_csv class method.

# read genetic map
gmap = StandardGeneticMap.from_csv(
    "McMullen_2009_US_NAM.gmap",
    vrnt_chrgrp_col     = "chr",
    vrnt_phypos_col     = "pos",
    vrnt_genpos_col     = "cM",
    vrnt_genpos_units   = "cM",
    auto_group          = True,
    auto_build_spline   = True,
    sep                 = "\t",
    header              = 0,
)

Creating a Genetic Map Function#

After loading our genetic map data, we want to create a genetic map function object which will be used to calculate recombination probabilities for our simulations. We create a simple Haldane genetic map function using the HaldaneMapFunction class.

# use Haldane map function to calculate crossover probabilities
gmapfn = HaldaneMapFunction()

Loading Genome Data from a VCF File#

Next, we load phased genetic markers from a VCF file. In this example, we are using a subset of genetic markers from the Wisconsin Maize Diversity Panel, which is composed of 942 individuals. 2000 SNPs with a minor allele frequency greater than 0.2 have been randomly selected to keep the dataset small. From the original 942 individuals and 2000 SNPs, we randomly select 40 founders and 1000 SNPs to create a founding synthetic population.

# read phased genetic markers from a vcf file
fndr_pgmat = DensePhasedGenotypeMatrix.from_vcf("widiv_2000SNPs.vcf.gz", auto_group_vrnt=False)

# randomly select ``nfndr`` from which to create a founding population
ix = numpy.random.choice(fndr_pgmat.ntaxa, nfndr, replace = False)
fndr_pgmat = fndr_pgmat.select_taxa(ix)

# randomly select ``nqtl`` markers from founders
ix = numpy.random.choice(fndr_pgmat.nvrnt, nqtl, replace = False)
fndr_pgmat = fndr_pgmat.select_vrnt(ix)

# sort and group variants
fndr_pgmat.group_vrnt()

After loading the genetic markers, we interpolate the genetic map positions and the sequential marker crossover probabilities using the interp_xoprob method. We interpolate using the genetic map and genetic map function we have just created.

# interpolate genetic map positions
fndr_pgmat.interp_xoprob(gmap, gmapfn)

Constructing a Single-Trait Genomic Model#

Next, we want to construct a true genomic model that will model a single trait with a strictly additive genetic architecture. We construct a DenseAdditiveLinearGenomicModel object to do this.

# model intercepts: (1,ntrait)
beta = numpy.array([[0.0]], dtype = float)

# marker effects: (nvrnt,1)
mkreffect = numpy.random.normal(
    loc = 0.0,
    scale = 0.01,
    size = (fndr_pgmat.nvrnt,1)
)

# trait names: (ntrait,)
trait = numpy.array(["Syn1"], dtype = object)

# create an additive linear genomic model to model traits
algmod = DenseAdditiveLinearGenomicModel(
    beta        = beta,                 # model intercepts
    u_misc      = None,                 # miscellaneous random effects
    u_a         = mkreffect,            # random marker effects
    trait       = trait,                # trait names
    model_name  = "synthetic_model",    # name of the model
    hyperparams = None                  # model parameters
)

Build Founder Populations & Run Breeding Program Burn-In#

In this next series of steps, we’ll create a founder population by randomly intermating our 40 selected founders for 20 generations. Following random intermating, we’ll create 6 cohorts of doubled haploid individuals, each cohort representing progenies produced in a single year. In this simulation, the oldest 3 cohorts will be aggregated into a "main" population to serve as selection candidates. The youngest 3 cohorts will be “in the pipeline” and will represent doubled haploid progenies which are being made. From the "main" population, the top 5% of individuals from each family will be selected to form a "cand" population. It is from this "cand" population that parents will be selected. The burn-in segment of the breeding program continues until the mean expected heterozygosity decreases to just under 0.3. After this point is reached, the simulated populations are ready for use in the main simulation.

Randomly Intermate for `nrandmate` Generations#

To start our random mating scheme, we first randomly pair up our 40 founders and mate them to create hybrids. Each mating event creates 80 progenies. This creates a population of size 1600.

# create 2-way cross object
mate2way = TwoWayCross()

# randomly select and pair founders
xconfig = numpy.random.choice(nfndr, nfndr, replace = False)
xconfig = xconfig.reshape(nfndr // 2, 2)

# randomly intermate ``nfndr`` founders to create initial hybrids
fndr_pgmat = mate2way.mate(
    pgmat = fndr_pgmat,
    xconfig = xconfig,
    nmating = nmating,
    nprogeny = nprogeny,
)

Then, we take our hybrid progenies, randomly pair them up with each other, and mate them. Each mating event produces 1 progeny. We randomly intermate progenies for 20 generations to create a founding population.

# randomly intermate for ``nrandmate`` generations
# each individual in the population is randomly mated with another individual
# and creates a single progeny so that the population size is held constant
for gen in range(1,nrandmate+1):
    # randomly select and pair ``ntaxa`` parents
    ntaxa = fndr_pgmat.ntaxa
    xconfig = numpy.empty((ntaxa,2), dtype = int)
    xconfig[:,0] = numpy.random.choice(ntaxa, ntaxa, replace = False)
    xconfig[:,1] = numpy.random.choice(ntaxa, ntaxa, replace = False)
    # randomly intermate ``ntaxa`` parents
    fndr_pgmat = mate2way.mate(
        pgmat = fndr_pgmat,
        xconfig = xconfig,
        nmating = 1,
        nprogeny = 1,
    )
    print("Random Intermating:", gen)

Create Breeding Protocols for Burn-In#

Next, we make a 2-way DH cross protocol for our burn-in stage.

# create a 2-way DH cross object, use the counters from the 2-way cross object
mate2waydh = TwoWayDHCross(
    progeny_counter = mate2way.progeny_counter,
    family_counter  = mate2way.family_counter,
)

We also create a genotyping protocol that converts phased genotypes to unphased genotypes.

# create a genotyping protocol
gtprot = DenseUnphasedGenotyping()

To simulate phenotypes, we create a simple $G + E$ phenotyping protocol that phenotypes individuals in 4 locations, 1 replication each location. We use the founding population we have just created with random mating to set the narrow sense heritability at the single plot level to 0.4. With 4 replications, this increases the heritability to approximately 0.7 to begin.

# create a phenotyping protocol
ptprot = G_E_Phenotyping(algmod, 4, 1)
ptprot.set_h2(0.4, fndr_pgmat)

Next, we create a breeding value estimation protocol that simply uses means to estimate breeding value.

# create a breeding value estimation protocol
bvprot = MeanPhenotypicBreedingValue("taxa", "taxa_grp", trait)

Create a Within-Family Selection Function#

Next, we create a custom function to manually select individuals within families based on their breeding values. This function selects the top nindiv individuals within each family and outputs a set of indices corresponding to these individuals.

# define function to do within family selection based on yield
def within_family_selection(bvmat: DenseBreedingValueMatrix, nindiv: int) -> numpy.ndarray:
    order = numpy.arange(bvmat.ntaxa)
    value = bvmat.mat[:,0] # get yield breeding values
    indices = []
    groups = numpy.unique(bvmat.taxa_grp)
    for group in groups:
        mask = bvmat.taxa_grp == group
        tmp_order = order[mask]
        tmp_value = value[mask]
        value_argsort = tmp_value.argsort()
        ix = value_argsort[::-1][:nindiv]
        indices.append(tmp_order[ix])
    indices = numpy.concatenate(indices)
    return indices

Create Cohort Structure#

In the next step, we create a cohort structure as described at the beginning of this section. We utilize dictionaries to keep track of our cohorts and populations.

# create a dictionary to store founder individuals
fndr_genome = {"cand":None,   "main":None,   "queue":[]}
fndr_geno   = {"cand":None,   "main":None,   "queue":[]}
fndr_pheno  = {"cand":None,   "main":None}
fndr_bval   = {"cand":None,   "main":None}
fndr_gmod   = {"cand":algmod, "main":algmod, "true":algmod}

In the code block below, we define a helper function to help us create initial cohorts of individuals from our starting randomly mated population.

# define a helper function to help make cohorts of individuals
def cohort(
        mate2waydh: MatingProtocol,
        pgmat: DensePhasedGenotypeMatrix,
        ncross: int,
        nparent: int,
        nmating: int,
        nprogeny: int
    ) -> DensePhasedGenotypeMatrix:
    # sample indicies of individuals and reshape for input into mating protocol
    xconfix = numpy.random.choice(pgmat.ntaxa, ncross * nparent, replace = False)
    xconfig = xconfix.reshape(ncross, nparent)
    # mate individuals
    out = mate2waydh.mate(pgmat, xconfig, nmating, nprogeny)
    return out

Next, we fill our cohort queue and construct the "main" and "cand" populations from the queue.

# fill queue with cohort genomes derived from randomly mating the founders
fndr_genome["queue"] = [cohort(mate2waydh, fndr_pgmat, ncross, nparent, nmating, nprogeny) for _ in range(qlen)]

# construct the main population genomes from the first three cohorts in the queue
fndr_genome["main"] = DensePhasedGenotypeMatrix.concat_taxa(fndr_genome["queue"][0:3])

# genotype individuals to fill the genotyping queue
fndr_geno["queue"] = [gtprot.genotype(genome) for genome in fndr_genome["queue"]]

# construct the main population genotypes from the first three cohorts in the queue
fndr_geno["main"] = DenseGenotypeMatrix.concat_taxa(fndr_geno["queue"][0:3])

# phenotype the main population
fndr_pheno["main"] = ptprot.phenotype(fndr_genome["main"])

# calculate breeding values for the main population
fndr_bval["main"] = bvprot.estimate(fndr_pheno["main"], fndr_geno["main"])

# calculate indices for within family selection to get parental candidates
ix = within_family_selection(fndr_bval["main"], 4) # select top 5%

# select parental candidates
fndr_genome["cand"] = fndr_genome["main"].select_taxa(ix)
fndr_geno["cand"]   = fndr_geno["main"].select_taxa(ix)
fndr_bval["cand"]   = fndr_bval["main"].select_taxa(ix)

Create a Burn-In Selection Protocol Object#

Next, we create a selection protocol for our burn-in loop. This selects individuals with the best breeding values for our synthetic trait.

# use a hillclimber for the single-objective optimization algorithm
soalgo = SortingSubsetOptimizationAlgorithm()

# create a selection protocol that selects based on EBVs with an inbreeding constraint
burnin_selprot = EstimatedBreedingValueSubsetSelection(
    ntrait       = 1,            # number of expected traits
    unscale      = True,         # unscale breeding values to human-readable format
    ncross       = 20,           # number of cross configurations
    nparent      = 2,            # number of parents per cross configuration
    nmating      = 1,            # number of matings per cross configuration
    nprogeny     = 80,           # number of progeny per mating event
    nobj         = 1,            # number of objectives == 1 == yield
    soalgo       = soalgo,       # use sorting algorithm to solve single-objective problem
)

Running a Population Burn-in until MEH is slightly less than 0.30#

Finally, we perform selection on our populations until the mean expected heterozygosity of the "main" population reaches slightly less than 0.30. This in effect sets the starting genetic diversity for our simulations so that they are not as dependent on the set of founders that were initially randomly selected.

i = 0
while fndr_genome["main"].meh() > 0.30:
    # parental selection: select parents from parental candidates
    selcfg = burnin_selprot.select(
        pgmat = fndr_genome["cand"],
        gmat  = fndr_geno["cand"],
        ptdf  = fndr_pheno["cand"],
        bvmat = fndr_bval["cand"],
        gpmod = fndr_gmod["cand"],
        t_cur = 0,
        t_max = 0,
    )
    # mate: create new genomes; discard oldest cohort; concat new main population
    new_genome = mate2waydh.mate(
        pgmat    = selcfg.pgmat,
        xconfig  = selcfg.xconfig,
        nmating  = selcfg.nmating,
        nprogeny = selcfg.nprogeny,
    )
    fndr_genome["queue"].append(new_genome)
    discard = fndr_genome["queue"].pop(0)
    fndr_genome["main"] = DensePhasedGenotypeMatrix.concat_taxa(fndr_genome["queue"][0:3])
    # evaluate: genotype new genomes; discard oldest cohort; concat new main population
    new_geno = gtprot.genotype(new_genome)
    fndr_geno["queue"].append(new_geno)
    discard = fndr_geno["queue"].pop(0)
    fndr_geno["main"] = DenseGenotypeMatrix.concat_taxa(fndr_geno["queue"][0:3])
    # evaluate: phenotype main population
    fndr_pheno["main"] = ptprot.phenotype(fndr_genome["main"])
    # evaluate: calculate breeding values for the main population
    fndr_bval["main"] = bvprot.estimate(fndr_pheno["main"], fndr_geno["main"])
    # survivor selection: select parental candidate indices from main population
    ix = within_family_selection(fndr_bval["main"], 4) # select top 5%
    # survivor selection: select parental candidates from main population
    fndr_genome["cand"] = fndr_genome["main"].select_taxa(ix)
    fndr_geno["cand"] = fndr_geno["main"].select_taxa(ix)
    fndr_bval["cand"] = fndr_bval["main"].select_taxa(ix)
    print("Burn-in:", i+1)
    i += 1

print("Starting MEH:", fndr_genome["main"].meh())

Simulation Setup#

In the sections above, we created the starting point for our simulations. In these next sections, we’ll test two selection strategies: one constrained by inbreeding, and another unconstrained by inbreeding. We’ll plot and compare the results of these simulations.

Create a Constrained Selection Protocol Object#

First, we’ll define a selection protocol where individuals are selected on their breeding values subject to an inbreeding constraint specified by the user. We’ll use the OptimalContributionSubsetSelection class to accomplish this task and provide objective and inequality constraint violation transformation functions to create such a scenario.

# create a dense molecular coancestry matrix factory
cmatfcty = DenseMolecularCoancestryMatrixFactory()

# define an objective transformation function
def obj_trans(
        decnvec: numpy.ndarray,
        latentvec: numpy.ndarray,
        **kwargs: dict
    ) -> numpy.ndarray:
    """
    Receive an incoming vector of [MGR,BV1,...,BVn] and transform it to
    [BV1,...,BVn].

    Where::

        - MGR is the mean genomic relationship (kinship; in range [0,1]).
        - BVn is the nth mean breeding value for the subset.

    Parameters
    ----------
    decnvec : numpy.ndarray
        A decision space vector of shape (ndecn,)
    latentvec : numpy.ndarray
        A latent space function vector of shape (1+ntrait,)

    Returns
    -------
    out : numpy.ndarray
        A vector of shape (ntrait,).
    """
    # extract trait(s) as objective(s)
    return latentvec[1:]

# define an inequality constraint violation function
def ineqcv_trans(
        decnvec: numpy.ndarray,
        latentvec: numpy.ndarray,
        maxinb: Real,
        **kwargs: dict
    ) -> numpy.ndarray:
    """
    A custom inequality constraint violation function.

    Parameters
    ----------
    decnvec : numpy.ndarray
        A decision space vector of shape (ndecn,)
    latentvec : numpy.ndarray
        A latent space function vector of shape (1+ntrait,)
    minvec : numpy.ndarray
        Vector of minimum values for which the latent vector can take.

    Returns
    -------
    out : numpy.ndarray
        An inequality constraint violation vector of shape (1,).
    """
    # calculate constraint violation for inbreeding
    out = numpy.array([max(latentvec[0] - maxinb, 0.0)], dtype = float)
    # return inequality constraint violation array
    return out

# use a hillclimber for the single-objective optimization algorithm
soalgo = SteepestDescentSubsetHillClimber()

# create a selection protocol that selects based on EBVs with an inbreeding constraint
const_selprot = OptimalContributionSubsetSelection(
    ntrait       = 1,            # number of expected traits
    cmatfcty     = cmatfcty,     # coancestry/kinship matrix factory
    unscale      = True,         # unscale breeding values to human-readable format
    ncross       = 20,           # number of cross configurations
    nparent      = 2,            # number of parents per cross configuration
    nmating      = 1,            # number of matings per cross configuration
    nprogeny     = 80,           # number of progeny per mating event
    nobj         = 1,            # number of objectives == ntrait
    obj_trans    = obj_trans,    # latent vector transformation to create objective function
    nineqcv      = 1,            # number of inequality constraint violations
    ineqcv_trans = ineqcv_trans, # latent vector transformation to create inequality constraints
    ineqcv_trans_kwargs = {      # keyword arguments
        "maxinb": 1.0
    },
    soalgo       = soalgo,       # use hillclimber to solve single-objective problem
)

Create an Unconstrained Selection Protocol Object#

Second, we’ll define a selection protocol where individuals are selected solely on their breeding values. We’ll use the EstimatedBreedingValueSubsetSelection to accomplish this task.

# use a sorting algorithm for the single-objective optimization algorithm
soalgo = SortingSubsetOptimizationAlgorithm()

# create a selection protocol that selects based on EBVs with an inbreeding constraint
unconst_selprot = EstimatedBreedingValueSubsetSelection(
    ntrait       = 1,            # number of expected traits
    unscale      = True,         # unscale breeding values to human-readable format
    ncross       = 20,           # number of cross configurations
    nparent      = 2,            # number of parents per cross configuration
    nmating      = 1,            # number of matings per cross configuration
    nprogeny     = 80,           # number of progeny per mating event
    nobj         = 1,            # number of objectives == ntrait
    soalgo       = soalgo,       # use sorting algorithm to solve single-objective problem
)

Make a Statistics Recording Helper Function#

To assist in record keeping, we’ll create a function to help us record simulation metrics and store them into a dictionary.

# make recording helper function
def record(lbook: dict, gen: int, genome: dict, geno: dict, pheno: dict, bval: dict, gmod: dict) -> None:
    lbook["gen"].append(gen)
    lbook["main_meh"].append(genome["main"].meh())
    ################### main true lower selection limits ###################
    tmp = gmod["true"].lsl(genome["main"], unscale = True)
    lbook["main_Syn1_lsl"].append(tmp[0])
    ################### main true lower selection limits ###################
    tmp = gmod["true"].usl(genome["main"], unscale = True)
    lbook["main_Syn1_usl"].append(tmp[0])
    ###################### main true breeding values #######################
    tbv = gmod["true"].gebv(genome["main"])
    ################## main true breeding value minimums ###################
    tmp = tbv.tmin(unscale = True)
    lbook["main_Syn1_tbv_min"].append(tmp[0])
    #################### main true breeding value means ####################
    tmp = tbv.tmean(unscale = True)
    lbook["main_Syn1_tbv_mean"].append(tmp[0])
    ################## main true breeding value maximums ###################
    tmp = tbv.tmax(unscale = True)
    lbook["main_Syn1_tbv_max"].append(tmp[0])
    ############# main true breeding value standard deviations #############
    tmp = tbv.tstd(unscale = True)
    lbook["main_Syn1_tbv_std"].append(tmp[0])
    ##################### main true genetic variances ######################
    tmp = gmod["true"].var_A(genome["main"])
    lbook["main_Syn1_tbv_var_A"].append(tmp[0])
    ###################### main true genic variances #######################
    tmp = gmod["true"].var_a(genome["main"])
    lbook["main_Syn1_tbv_var_a"].append(tmp[0])
    #################### main estimated breeding values ####################
    ebv = bval["main"]
    ################ main estimated breeding value minimums ################
    tmp = ebv.tmin(unscale = True)
    lbook["main_Syn1_ebv_min"].append(tmp[0])
    #################### main true breeding value means ####################
    tmp = ebv.tmean(unscale = True)
    lbook["main_Syn1_ebv_mean"].append(tmp[0])
    ################## main true breeding value maximums ###################
    tmp = ebv.tmax(unscale = True)
    lbook["main_Syn1_ebv_max"].append(tmp[0])
    ############# main true breeding value standard deviations #############
    tmp = ebv.tstd(unscale = True)
    lbook["main_Syn1_ebv_std"].append(tmp[0])

Simulate Constrained Phenotypic Selection for 60 Generations#

The next subsections detail simulations in the constrained scenario.

Copy Founders#

First we’ll copy our founder populations so that we don’t modify them and can use them for repeated simulations.

# deep copy founder populations, bvals, etc. so we can replicate if needed
simul_genome = copy.deepcopy(fndr_genome)
simul_geno   = copy.deepcopy(fndr_geno)
simul_pheno  = copy.deepcopy(fndr_pheno)
simul_bval   = copy.deepcopy(fndr_bval)
simul_gmod   = copy.deepcopy(fndr_gmod)

Rudimentary Logbooks#

Then, we’ll make a dictionary logbook to store simulation metrics.

# make a dictionary logbook
const_lbook = {
    "gen"                   : [],
    "main_meh"              : [],
    "main_Syn1_lsl"         : [],
    "main_Syn1_usl"         : [],
    "main_Syn1_tbv_min"     : [],
    "main_Syn1_tbv_mean"    : [],
    "main_Syn1_tbv_max"     : [],
    "main_Syn1_tbv_std"     : [],
    "main_Syn1_tbv_var_A"   : [],
    "main_Syn1_tbv_var_a"   : [],
    "main_Syn1_ebv_min"     : [],
    "main_Syn1_ebv_mean"    : [],
    "main_Syn1_ebv_max"     : [],
    "main_Syn1_ebv_std"     : [],
}

Simulation Main Loop#

Next, we’ll simulate our breeding program for 60 generations. We’ll start at an inbreeding constraint of 0.84 or less to begin and gradually increase the maximum allowed inbreeding to 1.0 (complete homozygosity; zero genetic diversity) by the end of the simulation.

# record initial statistics
record(const_lbook, 0, simul_genome, simul_geno, simul_pheno, simul_bval, simul_gmod)

# create constraint change over time
# 0.84 manually determined from Pareto frontier examination
maxinb = numpy.linspace(0.84, 1, nsimul+1)

# main simulation loop
for gen in range(1,nsimul+1):
    # set the inbreeding constraint
    const_selprot.ineqcv_trans_kwargs["maxinb"] = maxinb[gen]
    # parental selection: select parents from parental candidates
    selcfg = const_selprot.select(
        pgmat = simul_genome["cand"],
        gmat  = simul_geno["cand"],
        ptdf  = simul_pheno["cand"],
        bvmat = simul_bval["cand"],
        gpmod = simul_gmod["cand"],
        t_cur = 0,
        t_max = 0,
    )
    # mate: create new genomes; discard oldest cohort; concat new main population
    new_genome = mate2waydh.mate(
        pgmat    = selcfg.pgmat,
        xconfig  = selcfg.xconfig,
        nmating  = selcfg.nmating,
        nprogeny = selcfg.nprogeny,
    )
    simul_genome["queue"].append(new_genome)
    discard = simul_genome["queue"].pop(0)
    simul_genome["main"] = DensePhasedGenotypeMatrix.concat_taxa(simul_genome["queue"][0:3])
    # evaluate: genotype new genomes; discard oldest cohort; concat new main population
    new_geno = gtprot.genotype(new_genome)
    simul_geno["queue"].append(new_geno)
    discard = simul_geno["queue"].pop(0)
    simul_geno["main"] = DenseGenotypeMatrix.concat_taxa(simul_geno["queue"][0:3])
    # evaluate: phenotype main population
    simul_pheno["main"] = ptprot.phenotype(simul_genome["main"])
    # evaluate: calculate breeding values for the main population
    simul_bval["main"] = bvprot.estimate(simul_pheno["main"], simul_geno["main"])
    # survivor selection: select parental candidate indices from main population
    ix = within_family_selection(simul_bval["main"], 4) # select top 5%
    # survivor selection: select parental candidates from main population
    simul_genome["cand"] = simul_genome["main"].select_taxa(ix)
    simul_geno["cand"] = simul_geno["main"].select_taxa(ix)
    simul_bval["cand"] = simul_bval["main"].select_taxa(ix)
    # record statistics
    record(const_lbook, gen, simul_genome, simul_geno, simul_pheno, simul_bval, simul_gmod)
    print("Generation:", gen)

Saving Results to a File#

Finally, we’ll save the results of our simulation to a CSV file.

# create output dataframe and save
const_lbook_df = pandas.DataFrame(const_lbook)
const_lbook_df.to_csv("const_lbook.csv", sep = ",", index = False)

Simulate Unconstrained Phenotypic Selection for 60 Generations#

The next subsections detail simulations in the unconstrained scenario.

Copy Founders#

As before, we’ll copy our founder populations so that we don’t modify them and can use them for repeated simulations.

# deep copy founder populations, bvals, etc. so we can replicate if needed
simul_genome = copy.deepcopy(fndr_genome)
simul_geno   = copy.deepcopy(fndr_geno)
simul_pheno  = copy.deepcopy(fndr_pheno)
simul_bval   = copy.deepcopy(fndr_bval)
simul_gmod   = copy.deepcopy(fndr_gmod)

Rudimentary Logbooks#

Then, we’ll make a dictionary logbook to store simulation metrics.

# make a dictionary logbook
unconst_lbook = {
    "gen"                   : [],
    "main_meh"              : [],
    "main_Syn1_lsl"         : [],
    "main_Syn1_usl"         : [],
    "main_Syn1_tbv_min"     : [],
    "main_Syn1_tbv_mean"    : [],
    "main_Syn1_tbv_max"     : [],
    "main_Syn1_tbv_std"     : [],
    "main_Syn1_tbv_var_A"   : [],
    "main_Syn1_tbv_var_a"   : [],
    "main_Syn1_ebv_min"     : [],
    "main_Syn1_ebv_mean"    : [],
    "main_Syn1_ebv_max"     : [],
    "main_Syn1_ebv_std"     : [],
}

Simulation Main Loop#

Next, we’ll simulate our breeding program for 60 generations.

# record initial statistics
record(unconst_lbook, 0, simul_genome, simul_geno, simul_pheno, simul_bval, simul_gmod)

# main simulation loop
for gen in range(1,nsimul+1):
    # parental selection: select parents from parental candidates
    selcfg = unconst_selprot.select(
        pgmat = simul_genome["cand"],
        gmat  = simul_geno["cand"],
        ptdf  = simul_pheno["cand"],
        bvmat = simul_bval["cand"],
        gpmod = simul_gmod["cand"],
        t_cur = 0,
        t_max = 0,
    )
    # mate: create new genomes; discard oldest cohort; concat new main population
    new_genome = mate2waydh.mate(
        pgmat    = selcfg.pgmat,
        xconfig  = selcfg.xconfig,
        nmating  = selcfg.nmating,
        nprogeny = selcfg.nprogeny,
    )
    simul_genome["queue"].append(new_genome)
    discard = simul_genome["queue"].pop(0)
    simul_genome["main"] = DensePhasedGenotypeMatrix.concat_taxa(simul_genome["queue"][0:3])
    # evaluate: genotype new genomes; discard oldest cohort; concat new main population
    new_geno = gtprot.genotype(new_genome)
    simul_geno["queue"].append(new_geno)
    discard = simul_geno["queue"].pop(0)
    simul_geno["main"] = DenseGenotypeMatrix.concat_taxa(simul_geno["queue"][0:3])
    # evaluate: phenotype main population
    simul_pheno["main"] = ptprot.phenotype(simul_genome["main"])
    # evaluate: calculate breeding values for the main population
    simul_bval["main"] = bvprot.estimate(simul_pheno["main"], simul_geno["main"])
    # survivor selection: select parental candidate indices from main population
    ix = within_family_selection(simul_bval["main"], 4) # select top 5%
    # survivor selection: select parental candidates from main population
    simul_genome["cand"] = simul_genome["main"].select_taxa(ix)
    simul_geno["cand"] = simul_geno["main"].select_taxa(ix)
    simul_bval["cand"] = simul_bval["main"].select_taxa(ix)
    # record statistics
    record(unconst_lbook, gen, simul_genome, simul_geno, simul_pheno, simul_bval, simul_gmod)
    print("Generation:", gen)

Saving Results to a File#

Finally, we’ll save the results of our simulation to a CSV file.

# create output dataframe and save
unconst_lbook_df = pandas.DataFrame(unconst_lbook)
unconst_lbook_df.to_csv("unconst_lbook.csv", sep = ",", index = False)

Visualizing Breeding Program Simulation Results with `matplotlib`#

Visualizing True Breeding Values (TBVs)#

Using the results, we’ll visualize the population mean true breeding values for the constrained and unconstrained scenarios. Furthermore, we’ll plot the upper and lower selection limits to depict the narrowing of the genetic diverity over time.

# create static figure
fig = pyplot.figure()
ax = pyplot.axes()
ax.plot(const_lbook_df["gen"],   const_lbook_df["main_Syn1_tbv_mean"],   '-b',  label = "Const. Sel.: Mean Pop. TBV")
ax.plot(const_lbook_df["gen"],   const_lbook_df["main_Syn1_lsl"],   ':b',  label = "Const. Sel.: LSL")
ax.plot(const_lbook_df["gen"],   const_lbook_df["main_Syn1_usl"],   '--b', label = "Const. Sel.: USL")
ax.plot(unconst_lbook_df["gen"], unconst_lbook_df["main_Syn1_tbv_mean"], '-r',  label = "Unconst. Sel.: Mean Pop. TBV")
ax.plot(unconst_lbook_df["gen"], unconst_lbook_df["main_Syn1_lsl"], ':r',  label = "Unconst. Sel.: LSL")
ax.plot(unconst_lbook_df["gen"], unconst_lbook_df["main_Syn1_usl"], '--r', label = "Unconst. Sel.: USL")
ax.set_title("Single-Trait Recurrent Phenotypic Selection")
ax.set_xlabel("Generation")
ax.set_ylabel("Synthetic Trait Breeding Value")
ax.legend()
pyplot.savefig("constrained_single_trait_phenotypic_selection_true_breeding_values.png", dpi = 300)
pyplot.close(fig)