Selection Protocols#

Class Family Overview#

The SelectionProtocol family of classes is the most expansive of the breeding protocol families. Selection protocols are responsible for creating optimization problem definitions for selection tasks, optimizing said optimization problems, and selecting sets of individuals based on the results of optimizations.

Selection protocols within the SelectionProtocol family may represent non-mating or mating selection strategies. Non-mating selection strategies are selection strategies where the pairing of individuals for mating does not matter. Mating selection strategies are selection strategies where the pairing of individuals for mating does matter.

Summary of Selection Protocol Classes#

The SelectionProtocol family of classes can be found in the pybrops.breed.prot.sel module. This family of classes can be broken into two major groups: abstract classes and concrete classes. The followed two subsections detail these groups.

Abstract selection protocol classes#

The SelectionProtocol abstract class defines the interface for all selection protocols (both non-mating and mating strategies). Derived from this interface is a MateSelectionProtocol abstract class which defines an additional interface for selection strategies where the pairing of individuals for mating does matter.

Each major abstract class has binary, integer, real, and subset subtypes, representing selection protocols where the optimization problem is defined for binary, integer, real, and subset search spaces, respectively.

All interfaces are summarized below.

Summary of abstract classes in the pybrops.breed.prot.sel module#

Class Name(s)

Class Type(s)

Class Description

SelectionProtocol

Abstract

Interface for all selection protocol classes.

BinarySelectionProtocol
IntegerSelectionProtocol
RealSelectionProtocol
SubsetSelectionProtocol

Abstract

Interface for all selection protocol classes for binary, integer, real, and subset search spaces, respectively.

MateSelectionProtocol

Abstract

Interface for all mate selection protocol classes.

BinaryMateSelectionProtocol
IntegerMateSelectionProtocol
RealMateSelectionProtocol
SubsetMateSelectionProtocol

Abstract

Interface for all mate selection protocol classes for binary, integer, real, and subset search spaces, respectively.

Concrete selection protocol classes#

PyBrOpS has an extensive collection of selection protocols, each defining a different selection strategy. A summary of the available implemented selection protocols is in the table below.

Summary of concrete classes in the pybrops.breed.prot.sel module#

Class Name(s)

Class Type(s)

Class Description

EstimatedBreedingValueBinarySelection
EstimatedBreedingValueIntegerSelection
EstimatedBreedingValueRealSelection
EstimatedBreedingValueSubsetSelection

Concrete

Concrete classes for selection on estimated breeding values in binary, integer, real, and subset search spaces, respectively.

ExpectedMaximumBreedingValueBinarySelection
ExpectedMaximumBreedingValueIntegerSelection
ExpectedMaximumBreedingValueRealSelection
ExpectedMaximumBreedingValueSubsetSelection

Concrete

Concrete classes for selection on expected maximum breeding values in binary, integer, real, and subset search spaces, respectively.

FamilyEstimatedBreedingValueBinarySelection
FamilyEstimatedBreedingValueIntegerSelection
FamilyEstimatedBreedingValueRealSelection
FamilyEstimatedBreedingValueSubsetSelection

Concrete

Concrete classes for selection on within-family estimated breeding values in binary, integer, real, and subset search spaces, respectively.

GenomicEstimatedBreedingValueBinarySelection
GenomicEstimatedBreedingValueIntegerSelection
GenomicEstimatedBreedingValueRealSelection
GenomicEstimatedBreedingValueSubsetSelection

Concrete

Concrete classes for selection on genomic estimated breeding values in binary, integer, real, and subset search spaces, respectively.

GenotypeBuilderSubsetSelection

Concrete

Concrete classes for selection using the genotype builder strategy in subset search spaces.

OptimalContributionBinarySelection
OptimalContributionIntegerSelection
OptimalContributionRealSelection
OptimalContributionSubsetSelection

Concrete

Concrete classes for selection using optimal contribution selection in binary, integer, real, and subset search spaces, respectively.

OptimalHaploidValueBinarySelection
OptimalHaploidValueIntegerSelection
OptimalHaploidValueRealSelection
OptimalHaploidValueSubsetSelection

Concrete

Concrete classes for selection on optimal haploid value values in binary, integer, real, and subset search spaces, respectively.

OptimalPopulationValueSubsetSelection

Concrete

Concrete classes for selection on optimal population values in subset search spaces.

RandomBinarySelection
RandomIntegerSelection
RandomRealSelection
RandomSubsetSelection

Concrete

Concrete classes for selection on random values in binary, integer, real, and subset search spaces, respectively.

UsefulnessCriterionBinarySelection
UsefulnessCriterionIntegerSelection
UsefulnessCriterionRealSelection
UsefulnessCriterionSubsetSelection

Concrete

Concrete classes for selection on usefulness criterion values in binary, integer, real, and subset search spaces, respectively.

WeightedGenomicBinarySelection
WeightedGenomicIntegerSelection
WeightedGenomicRealSelection
WeightedGenomicSubsetSelection

Concrete

Concrete classes for selection on weighted genomic estimated breeding values in binary, integer, real, and subset search spaces, respectively.

Loading Class Modules#

Abstract interface classes#

# interface for all selection protocols
from pybrops.breed.prot.sel.SelectionProtocol import SelectionProtocol
from pybrops.breed.prot.sel.MateSelectionProtocol import MateSelectionProtocol

# interfaces for all individual selection protocols
from pybrops.breed.prot.sel.BinarySelectionProtocol import BinarySelectionProtocol
from pybrops.breed.prot.sel.IntegerSelectionProtocol import IntegerSelectionProtocol
from pybrops.breed.prot.sel.RealSelectionProtocol import RealSelectionProtocol
from pybrops.breed.prot.sel.SubsetSelectionProtocol import SubsetSelectionProtocol

# interfaces for all mate selection protocols
from pybrops.breed.prot.sel.BinaryMateSelectionProtocol import BinaryMateSelectionProtocol
from pybrops.breed.prot.sel.IntegerMateSelectionProtocol import IntegerMateSelectionProtocol
from pybrops.breed.prot.sel.RealMateSelectionProtocol import RealMateSelectionProtocol
from pybrops.breed.prot.sel.SubsetMateSelectionProtocol import SubsetMateSelectionProtocol

Concrete implementation classes#

# EBV selection
from pybrops.breed.prot.sel.EstimatedBreedingValueSelection import EstimatedBreedingValueBinarySelection
from pybrops.breed.prot.sel.EstimatedBreedingValueSelection import EstimatedBreedingValueIntegerSelection
from pybrops.breed.prot.sel.EstimatedBreedingValueSelection import EstimatedBreedingValueRealSelection
from pybrops.breed.prot.sel.EstimatedBreedingValueSelection import EstimatedBreedingValueSubsetSelection

# EMBV selection
from pybrops.breed.prot.sel.ExpectedMaximumBreedingValueSelection import ExpectedMaximumBreedingValueBinarySelection
from pybrops.breed.prot.sel.ExpectedMaximumBreedingValueSelection import ExpectedMaximumBreedingValueIntegerSelection
from pybrops.breed.prot.sel.ExpectedMaximumBreedingValueSelection import ExpectedMaximumBreedingValueRealSelection
from pybrops.breed.prot.sel.ExpectedMaximumBreedingValueSelection import ExpectedMaximumBreedingValueSubsetSelection

# within family EBV selection
from pybrops.breed.prot.sel.FamilyEstimatedBreedingValueSelection import FamilyEstimatedBreedingValueBinarySelection
from pybrops.breed.prot.sel.FamilyEstimatedBreedingValueSelection import FamilyEstimatedBreedingValueIntegerSelection
from pybrops.breed.prot.sel.FamilyEstimatedBreedingValueSelection import FamilyEstimatedBreedingValueRealSelection
from pybrops.breed.prot.sel.FamilyEstimatedBreedingValueSelection import FamilyEstimatedBreedingValueSubsetSelection

# GEBV selection
from pybrops.breed.prot.sel.GenomicEstimatedBreedingValueSelection import GenomicEstimatedBreedingValueBinarySelection
from pybrops.breed.prot.sel.GenomicEstimatedBreedingValueSelection import GenomicEstimatedBreedingValueIntegerSelection
from pybrops.breed.prot.sel.GenomicEstimatedBreedingValueSelection import GenomicEstimatedBreedingValueRealSelection
from pybrops.breed.prot.sel.GenomicEstimatedBreedingValueSelection import GenomicEstimatedBreedingValueSubsetSelection

# GB selection
from pybrops.breed.prot.sel.GenotypeBuilderSelection import GenotypeBuilderSubsetSelection

# optimal contribution selection
from pybrops.breed.prot.sel.OptimalContributionSelection import OptimalContributionBinarySelection
from pybrops.breed.prot.sel.OptimalContributionSelection import OptimalContributionIntegerSelection
from pybrops.breed.prot.sel.OptimalContributionSelection import OptimalContributionRealSelection
from pybrops.breed.prot.sel.OptimalContributionSelection import OptimalContributionSubsetSelection

# OHV selection
from pybrops.breed.prot.sel.OptimalHaploidValueSelection import OptimalHaploidValueBinarySelection
from pybrops.breed.prot.sel.OptimalHaploidValueSelection import OptimalHaploidValueIntegerSelection
from pybrops.breed.prot.sel.OptimalHaploidValueSelection import OptimalHaploidValueRealSelection
from pybrops.breed.prot.sel.OptimalHaploidValueSelection import OptimalHaploidValueSubsetSelection

# OPV selection
from pybrops.breed.prot.sel.OptimalPopulationValueSelection import OptimalPopulationValueSubsetSelection

# random selection
from pybrops.breed.prot.sel.RandomSelection import RandomBinarySelection
from pybrops.breed.prot.sel.RandomSelection import RandomIntegerSelection
from pybrops.breed.prot.sel.RandomSelection import RandomRealSelection
from pybrops.breed.prot.sel.RandomSelection import RandomSubsetSelection

# UC selection
from pybrops.breed.prot.sel.UsefulnessCriterionSelection import UsefulnessCriterionBinarySelection
from pybrops.breed.prot.sel.UsefulnessCriterionSelection import UsefulnessCriterionIntegerSelection
from pybrops.breed.prot.sel.UsefulnessCriterionSelection import UsefulnessCriterionRealSelection
from pybrops.breed.prot.sel.UsefulnessCriterionSelection import UsefulnessCriterionSubsetSelection

# weighted GEBV selection
from pybrops.breed.prot.sel.WeightedGenomicSelection import WeightedGenomicBinarySelection
from pybrops.breed.prot.sel.WeightedGenomicSelection import WeightedGenomicIntegerSelection
from pybrops.breed.prot.sel.WeightedGenomicSelection import WeightedGenomicRealSelection
from pybrops.breed.prot.sel.WeightedGenomicSelection import WeightedGenomicSubsetSelection

Selection Protocol Properties#

Selection protocols share numerous properties with each other, as defined in the SelectionProtocol interface. These properties can be grouped into three categories: general properties, optimization problem properties, and optimization algorithm properties. These property groupings are summarized in the next three subsections.

General properties#

General properties for a selection protocol describe the number of crosses, parents, matings, and progenies for a selection protocol. These properties are essential for how selection configurations are created.

Summary of SelectionProtocol general properties#

Property

Description

nselindiv

Number of selected individuals.

ncross

Number of cross configurations to consider.

nparent

Number of parents per cross configuration.

nmating

Number of matings per cross configuration.

nprogeny

Number of progeny to derive from each mating event.

Optimization problem properties#

Optimization problem properties help define how optimization problems are constructed for later optimization usage.

Summary of SelectionProtocol optimization problem properties#

Property

Description

nobj

Number of optimization objectives.

obj_wt

Objective function weights.

obj_trans

Function which transforms outputs from latentfn to objective function values.

obj_trans_kwargs

Keyword arguments for the latent space to objective space transformation function.

nineqcv

Number of inequality constraint violation functions.

ineqcv_wt

Inequality constraint violation function weights.

ineqcv_trans

Function which transforms outputs from latentfn to inequality constraint violation values.

ineqcv_trans_kwargs

Keyword arguments for the latent space to inequality constraint violation transformation function.

neqcv

Number of equality constraint violations.

eqcv_wt

Equality constraint violation function weights.

eqcv_trans

Function which transforms outputs from latentfn to equality constraint violation values.

eqcv_trans_kwargs

Keyword arguments for the latent space to equality constraint violation transformation function.

ndset_wt

Nondominated set weights.

ndset_trans

Nondominated set transformation function.

ndset_trans_kwargs

Nondominated set transformation function keyword arguments.

Optimization algorithm properties#

Optimization algorithm properties store information required to perform optimizations. This includes random number generator sources and single- and multi-objective optimization algorithms.

Summary of SelectionProtocol optimization algorithm properties#

Property

Description

rng

Random number generation source for optimization.

soalgo

Single-objective optimization algorithm.

moalgo

Multi-objective opimization algorithm.

Creating Selection Protocol Classes#

Creating selection protocols is accomplished using the constructor for a given SelectionProtocol class. The code below demonstrates the construction of a selection protocol object for unconstrained selection based on GEBVs in a subset search space.

# create standard GEBV selection protocol in subset decision space
selprot = GenomicEstimatedBreedingValueSubsetSelection(
    ntrait = 2,
    ncross = 10,
    nparent = 2,
    nmating = 1,
    nprogeny = 40,
    nobj = 2,
)

Generating Selection Problems for Optimization#

Selection problems can be generated from a selection protocol object using the problem method. The following examples demonstrate how to generate unconstrained and constrained selection problems for selection based on GEBVs.

Setup for unconstrained optimization#

Most constructors assume unconstrained optimization by default. To create an unconstrained selection problem, first construct an unconstrained selection protocol object using the selection protocol’s constructor. In the example below, a selection protocol based on GEBVs is created. Here, there are two traits and two corresponding objectives for each of the traits. We manually specify the number of objectives to be 2, making this selection protocol multi-objective in nature.

# create the selection protocol for 10 two-way crosses
selprot_unconstrained = GenomicEstimatedBreedingValueSubsetSelection(
    ntrait = 2,
    ncross = 10, # ten crosses total
    nparent = 2, # two-way
    nmating = 1,
    nprogeny = 40,
    nobj = 2,
)

Setup for constrained optimization#

For a constrained optimization, suppose we wish to minimize the negated sum of GEBVs (equivalent to maximizing the sum of GEBVs) for our first trait, while using our second trait as a constraint. For the constraint on the second trait, suppose that we wish to identify solutions with a negated sum of GEBVs greater than or equal to -1 (equivalent to having a maximum sum of GEBVs at 1). To accomplish this task, we define obj_trans and ineqcv_trans functions and pass them as arguments in our GenomicEstimatedBreedingValueSubsetSelection constructor. Since our obj_trans function converts two latent trait GEBV values to a single objective, we manually specify the number of objectives to be 1, making the selection protocol single-objective in nature. The code below demonstrates this.

# suppose we desire to minimize the negated GEBV of our first trait,
# using our second trait as a constraint

# define an objective transformation function
def obj_trans(
        decnvec: numpy.ndarray,
        latentvec: numpy.ndarray,
        maskvec: numpy.ndarray,
        **kwargs: dict
    ) -> numpy.ndarray:
    """
    A custom objective transformation function.

    Parameters
    ----------
    decnvec : numpy.ndarray
        A decision space vector of shape (ndecn,)
    latentvec : numpy.ndarray
        A latent space function vector of shape (ntrait,)
    maskvec : numpy.ndarray
        A mask vector of shape (ntrait,)

    Returns
    -------
    out : numpy.ndarray
        A vector of shape (sum(maskvec),).
    """
    # extract trait(s) as objective(s)
    return latentvec[maskvec]

# define an inequality constraint violation function
def ineqcv_trans(
        decnvec: numpy.ndarray,
        latentvec: numpy.ndarray,
        minvec: numpy.ndarray,
        **kwargs: dict
    ) -> numpy.ndarray:
    """
    A custom inequality constraint violation function.

    Parameters
    ----------
    decnvec : numpy.ndarray
        A decision space vector of shape (ndecn,)
    latentvec : numpy.ndarray
        A latent space function vector of shape (ntrait,)
    minvec : numpy.ndarray
        Vector of minimum values for which the latent vector can take.

    Returns
    -------
    out : numpy.ndarray
        An inequality constraint violation vector of shape (ntrait,).
    """
    # calculate constraint violations
    out = minvec - latentvec
    # where constraint violation is negative (no constraint violation), set to zero
    out[out < 0] = 0
    # return inequality constraint violation array
    return out

# for constrained selection, make keyword arguments for our custom
# transformation functions
obj_trans_kwargs = {
    # we want to select the first trait
    "maskvec": numpy.array([True, False], dtype=bool)
}
ineqcv_trans_kwargs = {
    # we don't care about the first trait's minimum value (negated maximum), so set to -Inf
    # we do care about the second trait's minimum value (negated maximum), so set to -1.0
    "minvec": numpy.array([-numpy.inf, -1.0], dtype=float)
}

# create the constrained selection protocol for 10 two-way crosses
selprot_constrained = GenomicEstimatedBreedingValueSubsetSelection(
    ntrait = 2,
    ncross = 10, # ten crosses total
    nparent = 2, # two-way
    nmating = 1,
    nprogeny = 40,
    nobj = 1, # one since sum(maskvec) == 1
    obj_trans = obj_trans,
    obj_trans_kwargs = obj_trans_kwargs,
    nineqcv = 2,
    ineqcv_trans = ineqcv_trans,
    ineqcv_trans_kwargs = ineqcv_trans_kwargs
)

Generating the selection problem#

After constructing our unconstrained and constrained selection protocols, we can generate selection problems from them. To create selection problems, use the problem method as demonstrated in the code block below.

#
# Creating a genomic model
#

# model parameters
nfixed = 1      # number of fixed effects
ntrait = 2      # number of traits
nmisc = 0       # number of miscellaneous random effects
nadditive = 50  # number of additive marker effects

# create dummy values
beta = numpy.random.random((nfixed,ntrait))
u_misc = numpy.random.random((nmisc,ntrait))
u_a = numpy.random.random((nadditive,ntrait))
trait = numpy.array(["Trait"+str(i+1).zfill(2) for i in range(ntrait)], dtype = object)

# create additive linear genomic model
algmod = DenseAdditiveLinearGenomicModel(
    beta = beta,
    u_misc = u_misc,
    u_a = u_a,
    trait = trait,
    model_name = "example",
    params = None
)

#
# Construct random genomes
#

# shape parameters for random genomes
ntaxa = 100
nvrnt = nadditive
ngroup = 20
nchrom = 10
nphase = 2

# create random genotypes
mat = numpy.random.randint(0, 2, size = (nphase,ntaxa,nvrnt)).astype("int8")

# create taxa names
taxa = numpy.array(["taxon"+str(i+1).zfill(3) for i in range(ntaxa)], dtype = object)

# create taxa groups
taxa_grp = numpy.random.randint(1, ngroup+1, ntaxa)
taxa_grp.sort()

# create marker variant chromsome assignments
vrnt_chrgrp = numpy.random.randint(1, nchrom+1, nvrnt)
vrnt_chrgrp.sort()

# create marker physical positions
vrnt_phypos = numpy.random.choice(1000000, size = nvrnt, replace = False)
vrnt_phypos.sort()

# create marker variant names
vrnt_name = numpy.array(["SNP"+str(i+1).zfill(4) for i in range(nvrnt)], dtype = object)

# create a phased genotype matrix from scratch using NumPy arrays
pgmat = DensePhasedGenotypeMatrix(
    mat = mat,
    taxa = taxa,
    taxa_grp = taxa_grp,
    vrnt_chrgrp = vrnt_chrgrp,
    vrnt_phypos = vrnt_phypos,
    vrnt_name = vrnt_name,
    ploidy = nphase
)

# create a genotype matrix from scratch using NumPy arrays
gmat = DenseGenotypeMatrix(
    mat = mat.sum(0, dtype="int8"),
    taxa = taxa,
    taxa_grp = taxa_grp,
    vrnt_chrgrp = vrnt_chrgrp,
    vrnt_phypos = vrnt_phypos,
    vrnt_name = vrnt_name,
    ploidy = nphase
)

# generate an unconstrained GEBV subset selection problem
# for this selection protocol type, we only need the genotype matrix and a genomic prediction model
prob_unconstrained = selprot_unconstrained.problem(
    pgmat = None,
    gmat = gmat,
    ptdf = None,
    bvmat = None,
    gpmod = algmod,
    t_cur = None,
    t_max = None,
)

# generate a constrained GEBV subset selection problem
# for this selection protocol type, we only need the genotype matrix and a genomic prediction model
prob_constrained = selprot_constrained.problem(
    pgmat = None,
    gmat = gmat,
    ptdf = None,
    bvmat = None,
    gpmod = algmod,
    t_cur = None,
    t_max = None,
)

After creating unconstrained and constrained selection problems, we can test their evaluation on a random solution, as demonstrated in the code below.

# generate a random solution to test
soln = numpy.random.choice(prob_constrained.decn_space, prob_constrained.ndecn)

# evaluate the solution in the unconstrained problem
eval_unconstrained = prob_unconstrained.evalfn(soln)

# evaluate the solution in the constrained problem
eval_constrained = prob_constrained.evalfn(soln)

Single-Objective Optimization#

Since our constrained selection protocol above is also single-objective in nature, we can use the sosolve method to optimize. The sosolve method internally creates a selection problem and uses the algorithm defined in the soalgo property to optimize this selection problem. The results of the optimization are returned as a selection solution. The code below demonstrates single-objective optimization.

# perform single-objective optimization using
soln_constrained = selprot_constrained.sosolve(
    pgmat = None,
    gmat = gmat,
    ptdf = None,
    bvmat = None,
    gpmod = algmod,
    t_cur = None,
    t_max = None,
)

# examine the solution decision vector(s)
soln_constrained.soln_decn

# examine the solution objective function vector(s)
soln_constrained.soln_obj

# examine the solution inequality constraint violation vector(s)
soln_constrained.soln_ineqcv

# examine the soltuion equality constraint violation vector(s)
soln_constrained.soln_eqcv

Multi-Objective Optimization#

Since our unconstrained selection protocol above is multi-objective in nature, we can use the mosolve method to optimize for both objectives. The mosolve method internally creates a selection problem and uses the algorithm defined in the soalgo property to optimize this problem. The results of the optimization are returned. The code below demonstrates multi-objective optimization.

# perform single-objective optimization using
soln_unconstrained = selprot_unconstrained.mosolve(
    pgmat = None,
    gmat = gmat,
    ptdf = None,
    bvmat = None,
    gpmod = algmod,
    t_cur = None,
    t_max = None,
)

# examine the solution decision vector(s)
soln_unconstrained.soln_decn

# examine the solution objective function vector(s)
soln_unconstrained.soln_obj

# examine the solution inequality constraint violation vector(s)
soln_unconstrained.soln_ineqcv

# examine the soltuion equality constraint violation vector(s)
soln_unconstrained.soln_eqcv

Selection#

Selection of individuals or mating pairs (depending on the selection protocol type) can be accomplished using the select method. Internally, this method constructs an optimization problem from inputs, optimizes the problem using an appropriate algorithm, and selects individuals, returning a selection configuration object. Cross configurations can be extracted from the returned selection configuration object. If a selection protocol is multi-objective in nature, a solution from the non-dominated set is selected using the ndset_trans function, which scores solutions based on a user-defined method. If no ndset_trans function is provided, a default function is used which scores non-dominated solutions on their distance from a vector weighing each objective equally, invariant of the scales of the objectives. The code below demonstrates how to use the select method.

# Select individuals for the constrained problem formulation. In this selection,
# the best solution from a single-objective optimization is selected to
# determine the selection.
selcfg_constrained = selprot_constrained.select(
    pgmat = pgmat,
    gmat = gmat,
    ptdf = None,
    bvmat = None,
    gpmod = algmod,
    t_cur = None,
    t_max = None,
)

# view cross configuration from the selection
selcfg_constrained.xconfig

# Select individuals for the unconstrained problem formulation. In this
# selection, a Pareto frontier is estimated and a point selected from the
# frontier which is closest to a 1 vector projection in invariant space.
# The vector determining the selection configuration may be customized by
# changing how the selection protocol is constructed.
selcfg_unconstrained = selprot_unconstrained.select(
    pgmat = pgmat,
    gmat = gmat,
    ptdf = None,
    bvmat = None,
    gpmod = algmod,
    t_cur = None,
    t_max = None,
)

# view cross configuration from the selection
selcfg_unconstrained.xconfig