MultiObjectiveGenomicMating#
- class pybrops.breed.prot.sel.UnconstrainedMultiObjectiveGenomicMating.MultiObjectiveGenomicMating(nconfig, nparent, ncross, nprogeny, vmatfcty, nself, gmapfn, weight=<function weight_absolute>, target=<function target_positive>, unique_parents=True, mem=1024, method='single', objfn_trans=None, objfn_trans_kwargs=None, objfn_wt=1.0, ndset_trans=None, ndset_trans_kwargs=None, ndset_wt=1.0, rng=None, soalgo=None, moalgo=None, **kwargs)[source]#
Bases:
UnconstrainedSelectionProtocolClass implementing selection protocols for multi-objective genomic mating.
# TODO: add formulae for methodology.
Constructor for MultiObjectiveGenomicSelection class.
- Parameters:
nconfig (int) –
Number of cross configurations to consider.
Examples:
20 two-way crosses would be:
nconfig = 2020 three way crosses would be:
nconfig = 20
nparent (int) –
Number of parents to per configuration.
Example:
20 two-way crosses would be:
nparent = 220 three-way crosses would be:
nparent = 3
ncross (int) – Number of crosses per configuration.
nprogeny (int) – Number of progeny to derive from each cross.
vmatcls (class type) – Variance matrix class name from which to construct additive variance matrices from
s (int) –
Used for ‘vmatcls’ matrix construction. Number of selfing generations post-cross pattern before ‘nprogeny’ individuals are simulated.
Example
Description
nself = 0Derive gametes from F1
nself = 1Derive gametes from F2
nself = 2Derive gametes from F3
...etc.
nself = infDerive gametes from SSD
gmapfn (GeneticMapFunction) – Used for ‘vmatcls’ matrix construction. GeneticMapFunction to use to estimate covariance induced by recombination.
mem (int, default = 1024) –
Used for ‘vmatcls’ matrix construction. Memory chunk size to use during matrix operations. If
None, then memory chunk size is not limited.WARNING: Setting
mem = Nonemight result in memory allocation errors! For reference,mem = 1024refers to a matrix of size 1024x1024, which needs about 8.5 MB of storage. Matrices of course need a quadratic amount of memory: \(O(n^2)\).unique_parents (bool, default = True) – Whether to allow force unique parents or not. If
True, all parents in the mating configuration must be unique. IfFalse, non-unique parents are allowed. In this scenario, self-fertilization is considered as a viable option.method (str) –
Method of selecting parents.
Method
Description
"single"MOGM is transformed to a single objective and optimization is done on the transformed function. This is done using the
transfunction provided:optimize : objfn_trans(MOGM)
"pareto"MOGM is transformed by a transformation function, but NOT reduced to a single objective. The Pareto frontier for this transformed function is mapped using a multi-objective genetic algorithm.
Objectives are scaled to \([0,1]\) and a vector orthogonal to the hyperplane defined by the extremes of the front is drawn starting at the point defined by
ndset_trans. The closest point on the Pareto frontier to the orthogonal vector is selected.target (str or numpy.ndarray) –
If target is a string, check value and follow these rules:
Value
Description
"positive"Select alleles with the most positive effect.
"negative"Select alleles with the most negate effect.
"stabilizing"Set target allele frequency to
0.5.numpy.ndarrayUse frequency values in
targetas is.weight (str or numpy.ndarray) –
If weight is a string, check value and follow these rules:
Value
Description
"magnitude"Assign weights using the magnitudes of regression coefficients.
"equal"Assign weights equally.
objfn_trans (function, callable) –
Function to transform the MOGM function. If method = “single”, this function must return a scalar. If method = “pareto”, this function must return a
numpy.ndarray.Function definition:
objfn_trans(obj, **kwargs: dict): Parameters obj : scalar, numpy.ndarray Objective scalar or vector to be transformed kwargs : dict Additional keyword arguments Returns out : scalar, numpy.ndarray Transformed objective scalar or vector.
objfn_trans_kwargs (dict) – Dictionary of keyword arguments to be passed to ‘objfn_trans’.
objfn_wt (float, numpy.ndarray) –
Weight applied to transformed objective function. Indicates whether a function is maximizing or minimizing:
1.0for maximizing function.-1.0for minimizing function.
ndset_trans (numpy.ndarray) –
Function to transform nondominated points along the Pareto frontier into a single score for each point.
Function definition:
ndset_trans(ndset, **kwargs: dict): Parameters ndset : numpy.ndarray Array of shape (j,o) containing nondominated points. Where 'j' is the number of nondominated points and 'o' is the number of objectives. kwargs : dict Additional keyword arguments. Returns out : numpy.ndarray Array of shape (j,) containing transformed Pareto frontier points.
ndset_trans_kwargs (dict) – Dictionary of keyword arguments to be passed to ‘ndset_trans’.
ndset_wt (float) –
Weight applied to transformed nondominated points along Pareto frontier. Indicates whether a function is maximizing or minimizing.
1.0 for maximizing function. -1.0 for minimizing function.
soalgo (OptimizationAlgorithm) –
Single-objective optimization algorithm to optimize the objective function. If
None, use a SteepestAscentSetHillClimber with the following parameters:soalgo = SteepestAscentSetHillClimber( rng = self.rng # PRNG source )
moalgo (OptimizationAlgorithm) –
Multi-objective optimization algorithm to optimize the objective functions. If
None, use a NSGA2SetGeneticAlgorithm with the following parameters:moalgo = NSGA2SetGeneticAlgorithm( ngen = 250, # number of generations to evolve mu = 100, # number of parents in population lamb = 100, # number of progeny to produce M = 1.5, # algorithm crossover genetic map length rng = self.rng # PRNG source )
rng (numpy.random.Generator or None) – A random number generator source. Used for optimization algorithms. If
rngisNone, usepybrops.core.randommodule (NOT THREAD SAFE!).
Methods
Return a selection objective function for the provided datasets.
Multi-objective genomic mating objective function.
Return a vectorized selection objective function for the provided datasets.
A vectorized multi-objective genomic selection objective function.
Calculate a Pareto frontier for objectives.
Select individuals for breeding.
Attributes
Get data for property gmapfn.
Get data for property mem.
Get data for property method.
Get data for property moalgo.
Get data for property nconfig.
Get data for property ncross.
Get data for property ndset_trans.
Get data for property ndset_trans_kwargs.
Get data for property ndset_wt.
Get data for property nparent.
Get data for property nprogeny.
Get data for property nself.
Get data for property objfn_trans.
Get data for property objfn_trans_kwargs.
Get data for property objfn_wt.
Get data for property rng.
Get data for property soalgo.
Get data for property target.
Get data for property unique_parents.
Get data for property vmatfcty.
Get data for property weight.
- property gmapfn: GeneticMapFunction#
Get data for property gmapfn.
- property mem: int#
Get data for property mem.
- property method: str#
Get data for property method.
- property moalgo: UnconstrainedOptimizationAlgorithm#
Get data for property moalgo.
- property nconfig: int#
Get data for property nconfig.
- property ncross: int#
Get data for property ncross.
- property ndset_trans: Callable | None#
Get data for property ndset_trans.
- property ndset_trans_kwargs: dict#
Get data for property ndset_trans_kwargs.
- property ndset_wt: ndarray#
Get data for property ndset_wt.
- property nparent: int#
Get data for property nparent.
- property nprogeny: int#
Get data for property nprogeny.
- property nself: int | Real#
Get data for property nself.
- objfn(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, **kwargs)[source]#
Return a selection objective function for the provided datasets.
- Parameters:
pgmat (PhasedGenotypeMatrix) – Phased genotype matrix.
gmat (GenotypeMatrix) – Input genotype matrix.
ptdf (pandas.DataFrame) – Not used by this function.
bvmat (BreedingValueMatrix) – Not used by this function.
gpmod (AdditiveLinearGenomicModel) – Linear genomic prediction model.
- Returns:
outfn – A selection objective function for the specified problem.
- Return type:
function
- static objfn_static(sel, xmap, mat, ploidy, tfreq, mkrwt, vmat, trans, kwargs)[source]#
Multi-objective genomic mating objective function.
The goal is to minimize all objectives for this function.
This is a bare bones function. Minimal error checking is done.
Objectives: \(F(\textbf{x})\)
\[F(\textbf{x}) = {[f^{\textup{PAU}}(\textbf{x}), f^{\textup{PAFD}}(\textbf{x})]}'\]Population Allele Unavailability (PAU): \(f^{\textup{PAU}}(\textbf{x})\)
Formal PAU definition:
\[f^{\textup{PAU}}(\textbf{x}) = \textbf{w} \cdot \textbf{u}\]Given a genotype matrix
matand a selection indices vector \(\textbf{x} =\)sel, calculate the selection allele frequency. From the selection allele frequencies and the target allele frequenciestfreq, determine if the target frequencies can be attained after unlimited generations of selection. If the target allele frequency at a locus cannot be attained, score locus as1, otherwise score as0. Store this into a binary score vector \(\textbf{u}\). Take the dot product between the binary score vector and the marker weight vector \(\textbf{w} =\)mkrwtto calculate \(f^{\textup{PAU}}(\textbf{x})\) and return the result.Population Allele Frequency Distance (PAFD): \(f^{\textup{PAFD}}(\textbf{x})\)
Formal PAFD definition:
\[f^{\textup{PAFD}}(\textbf{x}) = \textbf{w} \cdot \left | \textbf{p}_{x} - \textbf{p}_{t} \right |\]Given a genotype matrix
matand a selection indices vector \(\textbf{x} =\)sel, calculate the selection allele frequency \(\textbf{p}_{x}\). From the selection allele frequencies and the target allele frequencies \(\textbf{p}_{t} =\)tfreq, calculate the absolute value of the difference between the two vectors. Finally, take the dot product between the difference vector and the marker weight vector \(\textbf{w} =\)mkrwtto calculate \(f^{\textup{PAFD}}(\textbf{x})\) and return the result.Sum of Progeny Standard Deviations of Additive Variance (SPstdA): \(f^{\textup{SPstdA}}(\textbf{x})\)
Formal SPstdA definition:
\[f^{\textup{SPstdA}}(\textbf{x}) = \sum_{c \in S} \sigma_{A,c}\]Given a progeny variance matrix \(\Sigma_{A} =\)
vmatand a selection indices vector \(\textbf{x} =\)sel, take the sum of the square root of the progeny variance \(\sigma_{A,c} = \sqrt{\Sigma_{A,c}}\) for each cross.- Parameters:
sel (numpy.ndarray) –
A cross selection indices matrix of shape
(k,).Where:
kis the number of crosses to select.
Each index indicates which cross specified by
xmapto select.xmap (numpy.ndarray) –
A cross selection index map array of shape
(s,d).Where:
sis the size of the sample space (number of cross combinations fordparents).dis the number of parents.
mat (numpy.ndarray) –
A genotype matrix of shape
(n,p)representing only biallelic loci. One of the two alleles at a locus is coded using a1. The other allele is coded as a0.matholds the counts of the allele coded by1.Where:
nis the number of individuals.pis the number of markers.
Example:
# matrix of shape (n = 3, p = 4) mat = numpy.array([[0,2,1,0], [2,2,1,1], [0,1,0,2]])
ploidy (int) – Number of phases that the genotype matrix
matrepresents.tfreq (floating, numpy.ndarray) –
A target allele frequency matrix of shape
(p,t).Where:
pis the number of markers.tis the number of traits.
Example:
tfreq = numpy.array([0.2, 0.6, 0.7, 0.5])
mkrwt (numpy.ndarray) –
A marker weight coefficients matrix of shape
(p,t).Where:
pis the number of markers.tis the number of traits.
Remarks:
All values in
mkrwtmust be non-negative.
vmat (numpy.ndarray, Matrix) –
A variance matrix of shape
(n,...,n,t). Can be anumpy.ndarrayor a Matrix of some sort. Must be have the[]operator to access elements of the matrix.Where:
nis the number of parental candidates.tis the number of traits.(n,...,n,t)is a tuple of lengthd + 1.dis the number of parents for a cross.
trans (function or callable) –
A transformation operator to alter the output. Function must adhere to the following standard:
Must accept a single numpy.ndarray argument.
Must return a single object, whether scalar or numpy.ndarray.
kwargs (dict) – Dictionary of keyword arguments to pass to
transfunction.
- Returns:
mogm – A MOGM score matrix of shape
(t + t + t,)iftransisNone. Otherwise, of shape specified bytrans.Where:
tis the number of traits.
Matrix element ordering for un-transformed MOGM score matrix:
The first set of
telements in themogmoutput correspond to thetPAU outputs for each trait.The second set of
telements in themogmoutput correspond to thetPAFD outputs for each trait.The third set of
telements in themogmoutput correspond to thetSPstdA outputs for each trait.
- Return type:
numpy.ndarray
- property objfn_trans: Callable | None#
Get data for property objfn_trans.
- property objfn_trans_kwargs: dict#
Get data for property objfn_trans_kwargs.
- objfn_vec(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, **kwargs)[source]#
Return a vectorized selection objective function for the provided datasets.
- Parameters:
pgmat (PhasedGenotypeMatrix) – Not used by this function.
gmat (GenotypeMatrix) – Input genotype matrix.
ptdf (pandas.DataFrame) – Not used by this function.
bvmat (BreedingValueMatrix) – Not used by this function.
gpmod (AdditiveLinearGenomicModel) – Linear genomic prediction model.
- Returns:
outfn – A vectorized selection objective function for the specified problem.
- Return type:
function
- static objfn_vec_static(sel, xmap, mat, ploidy, tfreq, mkrwt, vmat, trans, kwargs)[source]#
A vectorized multi-objective genomic selection objective function.
The goal is to minimize all objectives for this function.
This is a bare bones function. Minimal error checking is done.
Objectives: \(F(\textbf{x})\)
\[F(\textbf{x}) = {[f^{\textup{PAU}}(\textbf{x}), f^{\textup{PAFD}}(\textbf{x})]}'\]Population Allele Unavailability (PAU): \(f^{\textup{PAU}}(\textbf{x})\)
\[f^{\textup{PAU}}(\textbf{x}) = \textbf{w} \cdot \textbf{u}\]Given a genotype matrix
matand a selection indices vector \(\textbf{x} =\)sel, calculate the selection allele frequency. From the selection allele frequencies and the target allele frequenciestfreq, determine if the target frequencies can be attained after unlimited generations of selection. If the target allele frequency at a locus cannot be attained, score locus as1, otherwise score as0. Store this into a binary score vector \(\textbf{u}\). Take the dot product between the binary score vector and the marker weight vector \(\textbf{w} =\)mkrwtto calculate \(f^{\textup{PAU}}(\textbf{x})\) and return the result.Population Allele Frequency Distance (PAFD): \(f^{\textup{PAFD}}(\textbf{x})\)
\[f^{\textup{PAFD}}(\textbf{x}) = \textbf{w} \cdot \left | \textbf{p}_{x} - \textbf{p}_{t} \right |\]Given a genotype matrix
matand a selection indices vector \(\textbf{x} =\)sel, calculate the selection allele frequency \(\textbf{p}_{x}\). From the selection allele frequencies and the target allele frequencies \(\textbf{p}_{t} =\)tfreq, calculate the absolute value of the difference between the two vectors. Finally, take the dot product between the difference vector and the marker weight vector \(\textbf{w} =\)mkrwtto calculate \(f^{\textup{PAFD}}(\textbf{x})\) and return the result.Sum of Progeny Standard Deviations of Additive Variance (SPstdA): \(f^{\textup{SPstdA}}(\textbf{x})\)
Formal SPstdA definition:
\[f^{\textup{SPstdA}}(\textbf{x}) = \sum_{c \in S} \sigma_{A,c}\]Given a progeny variance matrix \(\Sigma_{A} =\)
vmatand a selection indices vector \(\textbf{x} =\)sel, take the sum of the square root of the progeny variance \(\sigma_{A,c} = \sqrt{\Sigma_{A,c}}\) for each cross.- Parameters:
sel (numpy.ndarray) –
A selection indices matrix of shape
(j,k).Where:
jis the number of configurations to score.kis the number of individuals to select.
Each index indicates which individuals to select. Each index in
selrepresents a single individual’s row.selcannot beNone.xmap (numpy.ndarray) –
A cross selection index map array of shape
(s,d).Where:
sis the size of the sample space (number of cross combinations fordparents).dis the number of parents.
mat (numpy.ndarray) –
A genotype matrix of shape
(n,p)representing only biallelic loci. One of the two alleles at a locus is coded using a1. The other allele is coded as a0.matholds the counts of the allele coded by1.Where:
nis the number of individuals.pis the number of markers.
Example:
# matrix of shape (n = 3, p = 4) mat = numpy.array([[0,2,1,0], [2,2,1,1], [0,1,0,2]])
ploidy (int) – Number of phases that the genotype matrix
matrepresents.tfreq (floating, numpy.ndarray) –
A target allele frequency matrix of shape
(p,t).Where:
pis the number of markers.tis the number of traits.
Example:
tfreq = numpy.array([0.2, 0.6, 0.7, 0.5])
mkrwt (numpy.ndarray) –
A marker weight coefficients matrix of shape
(p,t).Where:
pis the number of markers.tis the number of traits.
Remarks:
All values in
mkrwtmust be non-negative.
vmat (numpy.ndarray, Matrix) –
A variance matrix of shape
(n,...,n,t). Can be anumpy.ndarrayor a Matrix of some sort. Must be have the[]operator to access elements of the matrix.Where:
nis the number of parental candidates.tis the number of traits.(n,...,n,t)is a tuple of lengthd + 1.dis the number of parents for a cross.
trans (function or callable) –
A transformation operator to alter the output. Function must adhere to the following standard:
Must accept a single
numpy.ndarrayargument.Must return a single object, whether scalar or numpy.ndarray.
kwargs (dict) – Dictionary of keyword arguments to pass to
transfunction.
- Returns:
mogm – A MOGM score matrix of shape
(j,t + t + t)iftransisNone. Otherwise, of shape specified bytrans.Where:
jis the number of selection configurations.tis the number of traits.
Matrix element ordering for un-transformed MOGM score matrix:
The first set of
telements in themogmoutput correspond to thetPAU outputs for each trait.The second set of
telements in themogmoutput correspond to thetPAFD outputs for each trait.The third set of
telements in themogmoutput correspond to thetSPstdA outputs for each trait.
- Return type:
numpy.ndarray
- property objfn_wt: ndarray#
Get data for property objfn_wt.
- pareto(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, miscout=None, **kwargs)[source]#
Calculate a Pareto frontier for objectives.
- Parameters:
pgmat (PhasedGenotypeMatrix) – Genomes
gmat (GenotypeMatrix) – Genotypes
ptdf (pandas.DataFrame) – Phenotype dataframe
bvmat (BreedingValueMatrix) – Breeding value matrix
gpmod (GenomicModel) – Genomic prediction model
t_cur (int) – Current generation number.
t_max (int) – Maximum (deadline) generation number.
miscout (dict, None, default = None) – Pointer to a dictionary for miscellaneous user defined output. If
dict, write to dict (may overwrite previously defined fields). IfNone, user defined output is not calculated or stored.kwargs (dict) – Additional keyword arguments.
- Returns:
out – A tuple containing two objects
(frontier, sel_config).Where:
frontieris anumpy.ndarrayof shape(q,v)containing Pareto frontier points.sel_configis anumpy.ndarrayof shape(q,k)containing parent selection decisions for each corresponding point in the Pareto frontier.
Where:
qis the number of points in the frontier.vis the number of objectives for the frontier.kis the number of search space decision variables.
- Return type:
tuple
- property rng: Generator | RandomState#
Get data for property rng.
- select(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, miscout=None, **kwargs)[source]#
Select individuals for breeding.
- Parameters:
pgmat (PhasedGenotypeMatrix) – Genomes
gmat (GenotypeMatrix) – Genotypes (unphased most likely)
ptdf (pandas.DataFrame) – Phenotype dataframe
bvmat (BreedingValueMatrix) – Breeding value matrix
gpmod (GenomicModel) – Genomic prediction model
t_cur (int) – Current generation number.
t_max (int) – Maximum (deadline) generation number.
miscout (dict, None, default = None) – Pointer to a dictionary for miscellaneous user defined output. If
dict, write to dict (may overwrite previously defined fields). IfNone, user defined output is not calculated or stored.kwargs (dict) – Additional keyword arguments.
- Returns:
out – A tuple containing four objects:
(pgmat, sel, ncross, nprogeny).Where:
pgmatis a PhasedGenotypeMatrix of parental candidates.selis anumpy.ndarrayof indices specifying a cross pattern. Each index corresponds to an individual inpgmat.ncrossis anumpy.ndarrayspecifying the number of crosses to perform per cross pattern.nprogenyis anumpy.ndarrayspecifying the number of progeny to generate per cross.
- Return type:
tuple
- property soalgo: UnconstrainedOptimizationAlgorithm#
Get data for property soalgo.
- property target: ndarray | Callable | str#
Get data for property target.
- property unique_parents: bool#
Get data for property unique_parents.
- property vmatfcty: GeneticVarianceMatrixFactory#
Get data for property vmatfcty.
- property weight: ndarray | Callable | str#
Get data for property weight.