Selection Protocols#
Class Family Overview#
The SelectionProtocol
family of classes is the most expansive of the breeding protocol families. Selection protocols are responsible for creating optimization problem definitions for selection tasks, optimizing said optimization problems, and selecting sets of individuals based on the results of optimizations.
Selection protocols within the SelectionProtocol
family may represent non-mating or mating selection strategies. Non-mating selection strategies are selection strategies where the pairing of individuals for mating does not matter. Mating selection strategies are selection strategies where the pairing of individuals for mating does matter.
Summary of Selection Protocol Classes#
The SelectionProtocol
family of classes can be found in the pybrops.breed.prot.sel
module. This family of classes can be broken into two major groups: abstract classes and concrete classes. The followed two subsections detail these groups.
Abstract selection protocol classes#
The SelectionProtocol
abstract class defines the interface for all selection protocols (both non-mating and mating strategies). Derived from this interface is a MateSelectionProtocol
abstract class which defines an additional interface for selection strategies where the pairing of individuals for mating does matter.
Each major abstract class has binary, integer, real, and subset subtypes, representing selection protocols where the optimization problem is defined for binary, integer, real, and subset search spaces, respectively.
All interfaces are summarized below.
Class Name(s) |
Class Type(s) |
Class Description |
---|---|---|
SelectionProtocol |
Abstract |
Interface for all selection protocol classes. |
BinarySelectionProtocol IntegerSelectionProtocol RealSelectionProtocol SubsetSelectionProtocol |
Abstract |
Interface for all selection protocol classes for binary, integer, real, and subset search spaces, respectively. |
MateSelectionProtocol |
Abstract |
Interface for all mate selection protocol classes. |
BinaryMateSelectionProtocol IntegerMateSelectionProtocol RealMateSelectionProtocol SubsetMateSelectionProtocol |
Abstract |
Interface for all mate selection protocol classes for binary, integer, real, and subset search spaces, respectively. |
Concrete selection protocol classes#
PyBrOpS has an extensive collection of selection protocols, each defining a different selection strategy. A summary of the available implemented selection protocols is in the table below.
Class Name(s) |
Class Type(s) |
Class Description |
---|---|---|
EstimatedBreedingValueBinarySelection EstimatedBreedingValueIntegerSelection EstimatedBreedingValueRealSelection EstimatedBreedingValueSubsetSelection |
Concrete |
Concrete classes for selection on estimated breeding values in binary, integer, real, and subset search spaces, respectively. |
ExpectedMaximumBreedingValueBinarySelection ExpectedMaximumBreedingValueIntegerSelection ExpectedMaximumBreedingValueRealSelection ExpectedMaximumBreedingValueSubsetSelection |
Concrete |
Concrete classes for selection on expected maximum breeding values in binary, integer, real, and subset search spaces, respectively. |
FamilyEstimatedBreedingValueBinarySelection FamilyEstimatedBreedingValueIntegerSelection FamilyEstimatedBreedingValueRealSelection FamilyEstimatedBreedingValueSubsetSelection |
Concrete |
Concrete classes for selection on within-family estimated breeding values in binary, integer, real, and subset search spaces, respectively. |
GenomicEstimatedBreedingValueBinarySelection GenomicEstimatedBreedingValueIntegerSelection GenomicEstimatedBreedingValueRealSelection GenomicEstimatedBreedingValueSubsetSelection |
Concrete |
Concrete classes for selection on genomic estimated breeding values in binary, integer, real, and subset search spaces, respectively. |
GenotypeBuilderSubsetSelection |
Concrete |
Concrete classes for selection using the genotype builder strategy in subset search spaces. |
OptimalContributionBinarySelection OptimalContributionIntegerSelection OptimalContributionRealSelection OptimalContributionSubsetSelection |
Concrete |
Concrete classes for selection using optimal contribution selection in binary, integer, real, and subset search spaces, respectively. |
OptimalHaploidValueBinarySelection OptimalHaploidValueIntegerSelection OptimalHaploidValueRealSelection OptimalHaploidValueSubsetSelection |
Concrete |
Concrete classes for selection on optimal haploid value values in binary, integer, real, and subset search spaces, respectively. |
OptimalPopulationValueSubsetSelection |
Concrete |
Concrete classes for selection on optimal population values in subset search spaces. |
RandomBinarySelection RandomIntegerSelection RandomRealSelection RandomSubsetSelection |
Concrete |
Concrete classes for selection on random values in binary, integer, real, and subset search spaces, respectively. |
UsefulnessCriterionBinarySelection UsefulnessCriterionIntegerSelection UsefulnessCriterionRealSelection UsefulnessCriterionSubsetSelection |
Concrete |
Concrete classes for selection on usefulness criterion values in binary, integer, real, and subset search spaces, respectively. |
WeightedGenomicBinarySelection WeightedGenomicIntegerSelection WeightedGenomicRealSelection WeightedGenomicSubsetSelection |
Concrete |
Concrete classes for selection on weighted genomic estimated breeding values in binary, integer, real, and subset search spaces, respectively. |
Loading Class Modules#
Abstract interface classes#
# interface for all selection protocols
from pybrops.breed.prot.sel.SelectionProtocol import SelectionProtocol
from pybrops.breed.prot.sel.MateSelectionProtocol import MateSelectionProtocol
# interfaces for all individual selection protocols
from pybrops.breed.prot.sel.BinarySelectionProtocol import BinarySelectionProtocol
from pybrops.breed.prot.sel.IntegerSelectionProtocol import IntegerSelectionProtocol
from pybrops.breed.prot.sel.RealSelectionProtocol import RealSelectionProtocol
from pybrops.breed.prot.sel.SubsetSelectionProtocol import SubsetSelectionProtocol
# interfaces for all mate selection protocols
from pybrops.breed.prot.sel.BinaryMateSelectionProtocol import BinaryMateSelectionProtocol
from pybrops.breed.prot.sel.IntegerMateSelectionProtocol import IntegerMateSelectionProtocol
from pybrops.breed.prot.sel.RealMateSelectionProtocol import RealMateSelectionProtocol
from pybrops.breed.prot.sel.SubsetMateSelectionProtocol import SubsetMateSelectionProtocol
Concrete implementation classes#
# EBV selection
from pybrops.breed.prot.sel.EstimatedBreedingValueSelection import EstimatedBreedingValueBinarySelection
from pybrops.breed.prot.sel.EstimatedBreedingValueSelection import EstimatedBreedingValueIntegerSelection
from pybrops.breed.prot.sel.EstimatedBreedingValueSelection import EstimatedBreedingValueRealSelection
from pybrops.breed.prot.sel.EstimatedBreedingValueSelection import EstimatedBreedingValueSubsetSelection
# EMBV selection
from pybrops.breed.prot.sel.ExpectedMaximumBreedingValueSelection import ExpectedMaximumBreedingValueBinarySelection
from pybrops.breed.prot.sel.ExpectedMaximumBreedingValueSelection import ExpectedMaximumBreedingValueIntegerSelection
from pybrops.breed.prot.sel.ExpectedMaximumBreedingValueSelection import ExpectedMaximumBreedingValueRealSelection
from pybrops.breed.prot.sel.ExpectedMaximumBreedingValueSelection import ExpectedMaximumBreedingValueSubsetSelection
# within family EBV selection
from pybrops.breed.prot.sel.FamilyEstimatedBreedingValueSelection import FamilyEstimatedBreedingValueBinarySelection
from pybrops.breed.prot.sel.FamilyEstimatedBreedingValueSelection import FamilyEstimatedBreedingValueIntegerSelection
from pybrops.breed.prot.sel.FamilyEstimatedBreedingValueSelection import FamilyEstimatedBreedingValueRealSelection
from pybrops.breed.prot.sel.FamilyEstimatedBreedingValueSelection import FamilyEstimatedBreedingValueSubsetSelection
# GEBV selection
from pybrops.breed.prot.sel.GenomicEstimatedBreedingValueSelection import GenomicEstimatedBreedingValueBinarySelection
from pybrops.breed.prot.sel.GenomicEstimatedBreedingValueSelection import GenomicEstimatedBreedingValueIntegerSelection
from pybrops.breed.prot.sel.GenomicEstimatedBreedingValueSelection import GenomicEstimatedBreedingValueRealSelection
from pybrops.breed.prot.sel.GenomicEstimatedBreedingValueSelection import GenomicEstimatedBreedingValueSubsetSelection
# GB selection
from pybrops.breed.prot.sel.GenotypeBuilderSelection import GenotypeBuilderSubsetSelection
# optimal contribution selection
from pybrops.breed.prot.sel.OptimalContributionSelection import OptimalContributionBinarySelection
from pybrops.breed.prot.sel.OptimalContributionSelection import OptimalContributionIntegerSelection
from pybrops.breed.prot.sel.OptimalContributionSelection import OptimalContributionRealSelection
from pybrops.breed.prot.sel.OptimalContributionSelection import OptimalContributionSubsetSelection
# OHV selection
from pybrops.breed.prot.sel.OptimalHaploidValueSelection import OptimalHaploidValueBinarySelection
from pybrops.breed.prot.sel.OptimalHaploidValueSelection import OptimalHaploidValueIntegerSelection
from pybrops.breed.prot.sel.OptimalHaploidValueSelection import OptimalHaploidValueRealSelection
from pybrops.breed.prot.sel.OptimalHaploidValueSelection import OptimalHaploidValueSubsetSelection
# OPV selection
from pybrops.breed.prot.sel.OptimalPopulationValueSelection import OptimalPopulationValueSubsetSelection
# random selection
from pybrops.breed.prot.sel.RandomSelection import RandomBinarySelection
from pybrops.breed.prot.sel.RandomSelection import RandomIntegerSelection
from pybrops.breed.prot.sel.RandomSelection import RandomRealSelection
from pybrops.breed.prot.sel.RandomSelection import RandomSubsetSelection
# UC selection
from pybrops.breed.prot.sel.UsefulnessCriterionSelection import UsefulnessCriterionBinarySelection
from pybrops.breed.prot.sel.UsefulnessCriterionSelection import UsefulnessCriterionIntegerSelection
from pybrops.breed.prot.sel.UsefulnessCriterionSelection import UsefulnessCriterionRealSelection
from pybrops.breed.prot.sel.UsefulnessCriterionSelection import UsefulnessCriterionSubsetSelection
# weighted GEBV selection
from pybrops.breed.prot.sel.WeightedGenomicSelection import WeightedGenomicBinarySelection
from pybrops.breed.prot.sel.WeightedGenomicSelection import WeightedGenomicIntegerSelection
from pybrops.breed.prot.sel.WeightedGenomicSelection import WeightedGenomicRealSelection
from pybrops.breed.prot.sel.WeightedGenomicSelection import WeightedGenomicSubsetSelection
Selection Protocol Properties#
Selection protocols share numerous properties with each other, as defined in the SelectionProtocol
interface. These properties can be grouped into three categories: general properties, optimization problem properties, and optimization algorithm properties. These property groupings are summarized in the next three subsections.
General properties#
General properties for a selection protocol describe the number of crosses, parents, matings, and progenies for a selection protocol. These properties are essential for how selection configurations are created.
Property |
Description |
---|---|
|
Number of selected individuals. |
|
Number of cross configurations to consider. |
|
Number of parents per cross configuration. |
|
Number of matings per cross configuration. |
|
Number of progeny to derive from each mating event. |
Optimization problem properties#
Optimization problem properties help define how optimization problems are constructed for later optimization usage.
Property |
Description |
---|---|
|
Number of optimization objectives. |
|
Objective function weights. |
|
Function which transforms outputs from |
|
Keyword arguments for the latent space to objective space transformation function. |
|
Number of inequality constraint violation functions. |
|
Inequality constraint violation function weights. |
|
Function which transforms outputs from |
|
Keyword arguments for the latent space to inequality constraint violation transformation function. |
|
Number of equality constraint violations. |
|
Equality constraint violation function weights. |
|
Function which transforms outputs from |
|
Keyword arguments for the latent space to equality constraint violation transformation function. |
|
Nondominated set weights. |
|
Nondominated set transformation function. |
|
Nondominated set transformation function keyword arguments. |
Optimization algorithm properties#
Optimization algorithm properties store information required to perform optimizations. This includes random number generator sources and single- and multi-objective optimization algorithms.
Property |
Description |
---|---|
|
Random number generation source for optimization. |
|
Single-objective optimization algorithm. |
|
Multi-objective opimization algorithm. |
Creating Selection Protocol Classes#
Creating selection protocols is accomplished using the constructor for a given SelectionProtocol
class. The code below demonstrates the construction of a selection protocol object for unconstrained selection based on GEBVs in a subset search space.
# create standard GEBV selection protocol in subset decision space
selprot = GenomicEstimatedBreedingValueSubsetSelection(
ntrait = 2,
ncross = 10,
nparent = 2,
nmating = 1,
nprogeny = 40,
nobj = 2,
)
Generating Selection Problems for Optimization#
Selection problems can be generated from a selection protocol object using the problem
method. The following examples demonstrate how to generate unconstrained and constrained selection problems for selection based on GEBVs.
Setup for unconstrained optimization#
Most constructors assume unconstrained optimization by default. To create an unconstrained selection problem, first construct an unconstrained selection protocol object using the selection protocol’s constructor. In the example below, a selection protocol based on GEBVs is created. Here, there are two traits and two corresponding objectives for each of the traits. We manually specify the number of objectives to be 2, making this selection protocol multi-objective in nature.
# create the selection protocol for 10 two-way crosses
selprot_unconstrained = GenomicEstimatedBreedingValueSubsetSelection(
ntrait = 2,
ncross = 10, # ten crosses total
nparent = 2, # two-way
nmating = 1,
nprogeny = 40,
nobj = 2,
)
Setup for constrained optimization#
For a constrained optimization, suppose we wish to minimize the negated sum of GEBVs (equivalent to maximizing the sum of GEBVs) for our first trait, while using our second trait as a constraint. For the constraint on the second trait, suppose that we wish to identify solutions with a negated sum of GEBVs greater than or equal to -1 (equivalent to having a maximum sum of GEBVs at 1). To accomplish this task, we define obj_trans
and ineqcv_trans
functions and pass them as arguments in our GenomicEstimatedBreedingValueSubsetSelection
constructor. Since our obj_trans
function converts two latent trait GEBV values to a single objective, we manually specify the number of objectives to be 1, making the selection protocol single-objective in nature. The code below demonstrates this.
# suppose we desire to minimize the negated GEBV of our first trait,
# using our second trait as a constraint
# define an objective transformation function
def obj_trans(
decnvec: numpy.ndarray,
latentvec: numpy.ndarray,
maskvec: numpy.ndarray,
**kwargs: dict
) -> numpy.ndarray:
"""
A custom objective transformation function.
Parameters
----------
decnvec : numpy.ndarray
A decision space vector of shape (ndecn,)
latentvec : numpy.ndarray
A latent space function vector of shape (ntrait,)
maskvec : numpy.ndarray
A mask vector of shape (ntrait,)
Returns
-------
out : numpy.ndarray
A vector of shape (sum(maskvec),).
"""
# extract trait(s) as objective(s)
return latentvec[maskvec]
# define an inequality constraint violation function
def ineqcv_trans(
decnvec: numpy.ndarray,
latentvec: numpy.ndarray,
minvec: numpy.ndarray,
**kwargs: dict
) -> numpy.ndarray:
"""
A custom inequality constraint violation function.
Parameters
----------
decnvec : numpy.ndarray
A decision space vector of shape (ndecn,)
latentvec : numpy.ndarray
A latent space function vector of shape (ntrait,)
minvec : numpy.ndarray
Vector of minimum values for which the latent vector can take.
Returns
-------
out : numpy.ndarray
An inequality constraint violation vector of shape (ntrait,).
"""
# calculate constraint violations
out = minvec - latentvec
# where constraint violation is negative (no constraint violation), set to zero
out[out < 0] = 0
# return inequality constraint violation array
return out
# for constrained selection, make keyword arguments for our custom
# transformation functions
obj_trans_kwargs = {
# we want to select the first trait
"maskvec": numpy.array([True, False], dtype=bool)
}
ineqcv_trans_kwargs = {
# we don't care about the first trait's minimum value (negated maximum), so set to -Inf
# we do care about the second trait's minimum value (negated maximum), so set to -1.0
"minvec": numpy.array([-numpy.inf, -1.0], dtype=float)
}
# create the constrained selection protocol for 10 two-way crosses
selprot_constrained = GenomicEstimatedBreedingValueSubsetSelection(
ntrait = 2,
ncross = 10, # ten crosses total
nparent = 2, # two-way
nmating = 1,
nprogeny = 40,
nobj = 1, # one since sum(maskvec) == 1
obj_trans = obj_trans,
obj_trans_kwargs = obj_trans_kwargs,
nineqcv = 2,
ineqcv_trans = ineqcv_trans,
ineqcv_trans_kwargs = ineqcv_trans_kwargs
)
Generating the selection problem#
After constructing our unconstrained and constrained selection protocols, we can generate selection problems from them. To create selection problems, use the problem
method as demonstrated in the code block below.
#
# Creating a genomic model
#
# model parameters
nfixed = 1 # number of fixed effects
ntrait = 2 # number of traits
nmisc = 0 # number of miscellaneous random effects
nadditive = 50 # number of additive marker effects
# create dummy values
beta = numpy.random.random((nfixed,ntrait))
u_misc = numpy.random.random((nmisc,ntrait))
u_a = numpy.random.random((nadditive,ntrait))
trait = numpy.array(["Trait"+str(i+1).zfill(2) for i in range(ntrait)], dtype = object)
# create additive linear genomic model
algmod = DenseAdditiveLinearGenomicModel(
beta = beta,
u_misc = u_misc,
u_a = u_a,
trait = trait,
model_name = "example",
params = None
)
#
# Construct random genomes
#
# shape parameters for random genomes
ntaxa = 100
nvrnt = nadditive
ngroup = 20
nchrom = 10
nphase = 2
# create random genotypes
mat = numpy.random.randint(0, 2, size = (nphase,ntaxa,nvrnt)).astype("int8")
# create taxa names
taxa = numpy.array(["taxon"+str(i+1).zfill(3) for i in range(ntaxa)], dtype = object)
# create taxa groups
taxa_grp = numpy.random.randint(1, ngroup+1, ntaxa)
taxa_grp.sort()
# create marker variant chromsome assignments
vrnt_chrgrp = numpy.random.randint(1, nchrom+1, nvrnt)
vrnt_chrgrp.sort()
# create marker physical positions
vrnt_phypos = numpy.random.choice(1000000, size = nvrnt, replace = False)
vrnt_phypos.sort()
# create marker variant names
vrnt_name = numpy.array(["SNP"+str(i+1).zfill(4) for i in range(nvrnt)], dtype = object)
# create a phased genotype matrix from scratch using NumPy arrays
pgmat = DensePhasedGenotypeMatrix(
mat = mat,
taxa = taxa,
taxa_grp = taxa_grp,
vrnt_chrgrp = vrnt_chrgrp,
vrnt_phypos = vrnt_phypos,
vrnt_name = vrnt_name,
ploidy = nphase
)
# create a genotype matrix from scratch using NumPy arrays
gmat = DenseGenotypeMatrix(
mat = mat.sum(0, dtype="int8"),
taxa = taxa,
taxa_grp = taxa_grp,
vrnt_chrgrp = vrnt_chrgrp,
vrnt_phypos = vrnt_phypos,
vrnt_name = vrnt_name,
ploidy = nphase
)
# generate an unconstrained GEBV subset selection problem
# for this selection protocol type, we only need the genotype matrix and a genomic prediction model
prob_unconstrained = selprot_unconstrained.problem(
pgmat = None,
gmat = gmat,
ptdf = None,
bvmat = None,
gpmod = algmod,
t_cur = None,
t_max = None,
)
# generate a constrained GEBV subset selection problem
# for this selection protocol type, we only need the genotype matrix and a genomic prediction model
prob_constrained = selprot_constrained.problem(
pgmat = None,
gmat = gmat,
ptdf = None,
bvmat = None,
gpmod = algmod,
t_cur = None,
t_max = None,
)
After creating unconstrained and constrained selection problems, we can test their evaluation on a random solution, as demonstrated in the code below.
# generate a random solution to test
soln = numpy.random.choice(prob_constrained.decn_space, prob_constrained.ndecn)
# evaluate the solution in the unconstrained problem
eval_unconstrained = prob_unconstrained.evalfn(soln)
# evaluate the solution in the constrained problem
eval_constrained = prob_constrained.evalfn(soln)
Single-Objective Optimization#
Since our constrained selection protocol above is also single-objective in nature, we can use the sosolve
method to optimize. The sosolve
method internally creates a selection problem and uses the algorithm defined in the soalgo
property to optimize this selection problem. The results of the optimization are returned as a selection solution. The code below demonstrates single-objective optimization.
# perform single-objective optimization using
soln_constrained = selprot_constrained.sosolve(
pgmat = None,
gmat = gmat,
ptdf = None,
bvmat = None,
gpmod = algmod,
t_cur = None,
t_max = None,
)
# examine the solution decision vector(s)
soln_constrained.soln_decn
# examine the solution objective function vector(s)
soln_constrained.soln_obj
# examine the solution inequality constraint violation vector(s)
soln_constrained.soln_ineqcv
# examine the soltuion equality constraint violation vector(s)
soln_constrained.soln_eqcv
Multi-Objective Optimization#
Since our unconstrained selection protocol above is multi-objective in nature, we can use the mosolve
method to optimize for both objectives. The mosolve
method internally creates a selection problem and uses the algorithm defined in the soalgo
property to optimize this problem. The results of the optimization are returned. The code below demonstrates multi-objective optimization.
# perform single-objective optimization using
soln_unconstrained = selprot_unconstrained.mosolve(
pgmat = None,
gmat = gmat,
ptdf = None,
bvmat = None,
gpmod = algmod,
t_cur = None,
t_max = None,
)
# examine the solution decision vector(s)
soln_unconstrained.soln_decn
# examine the solution objective function vector(s)
soln_unconstrained.soln_obj
# examine the solution inequality constraint violation vector(s)
soln_unconstrained.soln_ineqcv
# examine the soltuion equality constraint violation vector(s)
soln_unconstrained.soln_eqcv
Selection#
Selection of individuals or mating pairs (depending on the selection protocol type) can be accomplished using the select
method. Internally, this method constructs an optimization problem from inputs, optimizes the problem using an appropriate algorithm, and selects individuals, returning a selection configuration object. Cross configurations can be extracted from the returned selection configuration object. If a selection protocol is multi-objective in nature, a solution from the non-dominated set is selected using the ndset_trans
function, which scores solutions based on a user-defined method. If no ndset_trans
function is provided, a default function is used which scores non-dominated solutions on their distance from a vector weighing each objective equally, invariant of the scales of the objectives. The code below demonstrates how to use the select
method.
# Select individuals for the constrained problem formulation. In this selection,
# the best solution from a single-objective optimization is selected to
# determine the selection.
selcfg_constrained = selprot_constrained.select(
pgmat = pgmat,
gmat = gmat,
ptdf = None,
bvmat = None,
gpmod = algmod,
t_cur = None,
t_max = None,
)
# view cross configuration from the selection
selcfg_constrained.xconfig
# Select individuals for the unconstrained problem formulation. In this
# selection, a Pareto frontier is estimated and a point selected from the
# frontier which is closest to a 1 vector projection in invariant space.
# The vector determining the selection configuration may be customized by
# changing how the selection protocol is constructed.
selcfg_unconstrained = selprot_unconstrained.select(
pgmat = pgmat,
gmat = gmat,
ptdf = None,
bvmat = None,
gpmod = algmod,
t_cur = None,
t_max = None,
)
# view cross configuration from the selection
selcfg_unconstrained.xconfig