MultiObjectiveGenomicMating#
- class pybrops.breed.prot.sel.UnconstrainedMultiObjectiveGenomicMating.MultiObjectiveGenomicMating(nconfig, nparent, ncross, nprogeny, vmatfcty, nself, gmapfn, weight=<function weight_absolute>, target=<function target_positive>, unique_parents=True, mem=1024, method='single', objfn_trans=None, objfn_trans_kwargs=None, objfn_wt=1.0, ndset_trans=None, ndset_trans_kwargs=None, ndset_wt=1.0, rng=None, soalgo=None, moalgo=None, **kwargs)[source]#
Bases:
UnconstrainedSelectionProtocol
Class implementing selection protocols for multi-objective genomic mating.
# TODO: add formulae for methodology.
Constructor for MultiObjectiveGenomicSelection class.
- Parameters:
nconfig (int) –
Number of cross configurations to consider.
Examples:
20 two-way crosses would be:
nconfig = 20
20 three way crosses would be:
nconfig = 20
nparent (int) –
Number of parents to per configuration.
Example:
20 two-way crosses would be:
nparent = 2
20 three-way crosses would be:
nparent = 3
ncross (int) – Number of crosses per configuration.
nprogeny (int) – Number of progeny to derive from each cross.
vmatcls (class type) – Variance matrix class name from which to construct additive variance matrices from
s (int) –
Used for ‘vmatcls’ matrix construction. Number of selfing generations post-cross pattern before ‘nprogeny’ individuals are simulated.
Example
Description
nself = 0
Derive gametes from F1
nself = 1
Derive gametes from F2
nself = 2
Derive gametes from F3
...
etc.
nself = inf
Derive gametes from SSD
gmapfn (GeneticMapFunction) – Used for ‘vmatcls’ matrix construction. GeneticMapFunction to use to estimate covariance induced by recombination.
mem (int, default = 1024) –
Used for ‘vmatcls’ matrix construction. Memory chunk size to use during matrix operations. If
None
, then memory chunk size is not limited.WARNING: Setting
mem = None
might result in memory allocation errors! For reference,mem = 1024
refers to a matrix of size 1024x1024, which needs about 8.5 MB of storage. Matrices of course need a quadratic amount of memory: \(O(n^2)\).unique_parents (bool, default = True) – Whether to allow force unique parents or not. If
True
, all parents in the mating configuration must be unique. IfFalse
, non-unique parents are allowed. In this scenario, self-fertilization is considered as a viable option.method (str) –
Method of selecting parents.
Method
Description
"single"
MOGM is transformed to a single objective and optimization is done on the transformed function. This is done using the
trans
function provided:optimize : objfn_trans(MOGM)
"pareto"
MOGM is transformed by a transformation function, but NOT reduced to a single objective. The Pareto frontier for this transformed function is mapped using a multi-objective genetic algorithm.
Objectives are scaled to \([0,1]\) and a vector orthogonal to the hyperplane defined by the extremes of the front is drawn starting at the point defined by
ndset_trans
. The closest point on the Pareto frontier to the orthogonal vector is selected.target (str or numpy.ndarray) –
If target is a string, check value and follow these rules:
Value
Description
"positive"
Select alleles with the most positive effect.
"negative"
Select alleles with the most negate effect.
"stabilizing"
Set target allele frequency to
0.5
.numpy.ndarray
Use frequency values in
target
as is.weight (str or numpy.ndarray) –
If weight is a string, check value and follow these rules:
Value
Description
"magnitude"
Assign weights using the magnitudes of regression coefficients.
"equal"
Assign weights equally.
objfn_trans (function, callable) –
Function to transform the MOGM function. If method = “single”, this function must return a scalar. If method = “pareto”, this function must return a
numpy.ndarray
.Function definition:
objfn_trans(obj, **kwargs: dict): Parameters obj : scalar, numpy.ndarray Objective scalar or vector to be transformed kwargs : dict Additional keyword arguments Returns out : scalar, numpy.ndarray Transformed objective scalar or vector.
objfn_trans_kwargs (dict) – Dictionary of keyword arguments to be passed to ‘objfn_trans’.
objfn_wt (float, numpy.ndarray) –
Weight applied to transformed objective function. Indicates whether a function is maximizing or minimizing:
1.0
for maximizing function.-1.0
for minimizing function.
ndset_trans (numpy.ndarray) –
Function to transform nondominated points along the Pareto frontier into a single score for each point.
Function definition:
ndset_trans(ndset, **kwargs: dict): Parameters ndset : numpy.ndarray Array of shape (j,o) containing nondominated points. Where 'j' is the number of nondominated points and 'o' is the number of objectives. kwargs : dict Additional keyword arguments. Returns out : numpy.ndarray Array of shape (j,) containing transformed Pareto frontier points.
ndset_trans_kwargs (dict) – Dictionary of keyword arguments to be passed to ‘ndset_trans’.
ndset_wt (float) –
Weight applied to transformed nondominated points along Pareto frontier. Indicates whether a function is maximizing or minimizing.
1.0 for maximizing function. -1.0 for minimizing function.
soalgo (OptimizationAlgorithm) –
Single-objective optimization algorithm to optimize the objective function. If
None
, use a SteepestAscentSetHillClimber with the following parameters:soalgo = SteepestAscentSetHillClimber( rng = self.rng # PRNG source )
moalgo (OptimizationAlgorithm) –
Multi-objective optimization algorithm to optimize the objective functions. If
None
, use a NSGA2SetGeneticAlgorithm with the following parameters:moalgo = NSGA2SetGeneticAlgorithm( ngen = 250, # number of generations to evolve mu = 100, # number of parents in population lamb = 100, # number of progeny to produce M = 1.5, # algorithm crossover genetic map length rng = self.rng # PRNG source )
rng (numpy.random.Generator or None) – A random number generator source. Used for optimization algorithms. If
rng
isNone
, usepybrops.core.random
module (NOT THREAD SAFE!).
Methods
Return a selection objective function for the provided datasets.
Multi-objective genomic mating objective function.
Return a vectorized selection objective function for the provided datasets.
A vectorized multi-objective genomic selection objective function.
Calculate a Pareto frontier for objectives.
Select individuals for breeding.
Attributes
Get data for property gmapfn.
Get data for property mem.
Get data for property method.
Get data for property moalgo.
Get data for property nconfig.
Get data for property ncross.
Get data for property ndset_trans.
Get data for property ndset_trans_kwargs.
Get data for property ndset_wt.
Get data for property nparent.
Get data for property nprogeny.
Get data for property nself.
Get data for property objfn_trans.
Get data for property objfn_trans_kwargs.
Get data for property objfn_wt.
Get data for property rng.
Get data for property soalgo.
Get data for property target.
Get data for property unique_parents.
Get data for property vmatfcty.
Get data for property weight.
- property gmapfn: GeneticMapFunction#
Get data for property gmapfn.
- property mem: int#
Get data for property mem.
- property method: str#
Get data for property method.
- property moalgo: UnconstrainedOptimizationAlgorithm#
Get data for property moalgo.
- property nconfig: int#
Get data for property nconfig.
- property ncross: int#
Get data for property ncross.
- property ndset_trans: Callable | None#
Get data for property ndset_trans.
- property ndset_trans_kwargs: dict#
Get data for property ndset_trans_kwargs.
- property ndset_wt: ndarray#
Get data for property ndset_wt.
- property nparent: int#
Get data for property nparent.
- property nprogeny: int#
Get data for property nprogeny.
- property nself: int | Real#
Get data for property nself.
- objfn(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, **kwargs)[source]#
Return a selection objective function for the provided datasets.
- Parameters:
pgmat (PhasedGenotypeMatrix) – Phased genotype matrix.
gmat (GenotypeMatrix) – Input genotype matrix.
ptdf (pandas.DataFrame) – Not used by this function.
bvmat (BreedingValueMatrix) – Not used by this function.
gpmod (AdditiveLinearGenomicModel) – Linear genomic prediction model.
- Returns:
outfn – A selection objective function for the specified problem.
- Return type:
function
- static objfn_static(sel, xmap, mat, ploidy, tfreq, mkrwt, vmat, trans, kwargs)[source]#
Multi-objective genomic mating objective function.
The goal is to minimize all objectives for this function.
This is a bare bones function. Minimal error checking is done.
Objectives: \(F(\textbf{x})\)
\[F(\textbf{x}) = {[f^{\textup{PAU}}(\textbf{x}), f^{\textup{PAFD}}(\textbf{x})]}'\]Population Allele Unavailability (PAU): \(f^{\textup{PAU}}(\textbf{x})\)
Formal PAU definition:
\[f^{\textup{PAU}}(\textbf{x}) = \textbf{w} \cdot \textbf{u}\]Given a genotype matrix
mat
and a selection indices vector \(\textbf{x} =\)sel
, calculate the selection allele frequency. From the selection allele frequencies and the target allele frequenciestfreq
, determine if the target frequencies can be attained after unlimited generations of selection. If the target allele frequency at a locus cannot be attained, score locus as1
, otherwise score as0
. Store this into a binary score vector \(\textbf{u}\). Take the dot product between the binary score vector and the marker weight vector \(\textbf{w} =\)mkrwt
to calculate \(f^{\textup{PAU}}(\textbf{x})\) and return the result.Population Allele Frequency Distance (PAFD): \(f^{\textup{PAFD}}(\textbf{x})\)
Formal PAFD definition:
\[f^{\textup{PAFD}}(\textbf{x}) = \textbf{w} \cdot \left | \textbf{p}_{x} - \textbf{p}_{t} \right |\]Given a genotype matrix
mat
and a selection indices vector \(\textbf{x} =\)sel
, calculate the selection allele frequency \(\textbf{p}_{x}\). From the selection allele frequencies and the target allele frequencies \(\textbf{p}_{t} =\)tfreq
, calculate the absolute value of the difference between the two vectors. Finally, take the dot product between the difference vector and the marker weight vector \(\textbf{w} =\)mkrwt
to calculate \(f^{\textup{PAFD}}(\textbf{x})\) and return the result.Sum of Progeny Standard Deviations of Additive Variance (SPstdA): \(f^{\textup{SPstdA}}(\textbf{x})\)
Formal SPstdA definition:
\[f^{\textup{SPstdA}}(\textbf{x}) = \sum_{c \in S} \sigma_{A,c}\]Given a progeny variance matrix \(\Sigma_{A} =\)
vmat
and a selection indices vector \(\textbf{x} =\)sel
, take the sum of the square root of the progeny variance \(\sigma_{A,c} = \sqrt{\Sigma_{A,c}}\) for each cross.- Parameters:
sel (numpy.ndarray) –
A cross selection indices matrix of shape
(k,)
.Where:
k
is the number of crosses to select.
Each index indicates which cross specified by
xmap
to select.xmap (numpy.ndarray) –
A cross selection index map array of shape
(s,d)
.Where:
s
is the size of the sample space (number of cross combinations ford
parents).d
is the number of parents.
mat (numpy.ndarray) –
A genotype matrix of shape
(n,p)
representing only biallelic loci. One of the two alleles at a locus is coded using a1
. The other allele is coded as a0
.mat
holds the counts of the allele coded by1
.Where:
n
is the number of individuals.p
is the number of markers.
Example:
# matrix of shape (n = 3, p = 4) mat = numpy.array([[0,2,1,0], [2,2,1,1], [0,1,0,2]])
ploidy (int) – Number of phases that the genotype matrix
mat
represents.tfreq (floating, numpy.ndarray) –
A target allele frequency matrix of shape
(p,t)
.Where:
p
is the number of markers.t
is the number of traits.
Example:
tfreq = numpy.array([0.2, 0.6, 0.7, 0.5])
mkrwt (numpy.ndarray) –
A marker weight coefficients matrix of shape
(p,t)
.Where:
p
is the number of markers.t
is the number of traits.
Remarks:
All values in
mkrwt
must be non-negative.
vmat (numpy.ndarray, Matrix) –
A variance matrix of shape
(n,...,n,t)
. Can be anumpy.ndarray
or a Matrix of some sort. Must be have the[]
operator to access elements of the matrix.Where:
n
is the number of parental candidates.t
is the number of traits.(n,...,n,t)
is a tuple of lengthd + 1
.d
is the number of parents for a cross.
trans (function or callable) –
A transformation operator to alter the output. Function must adhere to the following standard:
Must accept a single numpy.ndarray argument.
Must return a single object, whether scalar or numpy.ndarray.
kwargs (dict) – Dictionary of keyword arguments to pass to
trans
function.
- Returns:
mogm – A MOGM score matrix of shape
(t + t + t,)
iftrans
isNone
. Otherwise, of shape specified bytrans
.Where:
t
is the number of traits.
Matrix element ordering for un-transformed MOGM score matrix:
The first set of
t
elements in themogm
output correspond to thet
PAU outputs for each trait.The second set of
t
elements in themogm
output correspond to thet
PAFD outputs for each trait.The third set of
t
elements in themogm
output correspond to thet
SPstdA outputs for each trait.
- Return type:
numpy.ndarray
- property objfn_trans: Callable | None#
Get data for property objfn_trans.
- property objfn_trans_kwargs: dict#
Get data for property objfn_trans_kwargs.
- objfn_vec(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, **kwargs)[source]#
Return a vectorized selection objective function for the provided datasets.
- Parameters:
pgmat (PhasedGenotypeMatrix) – Not used by this function.
gmat (GenotypeMatrix) – Input genotype matrix.
ptdf (pandas.DataFrame) – Not used by this function.
bvmat (BreedingValueMatrix) – Not used by this function.
gpmod (AdditiveLinearGenomicModel) – Linear genomic prediction model.
- Returns:
outfn – A vectorized selection objective function for the specified problem.
- Return type:
function
- static objfn_vec_static(sel, xmap, mat, ploidy, tfreq, mkrwt, vmat, trans, kwargs)[source]#
A vectorized multi-objective genomic selection objective function.
The goal is to minimize all objectives for this function.
This is a bare bones function. Minimal error checking is done.
Objectives: \(F(\textbf{x})\)
\[F(\textbf{x}) = {[f^{\textup{PAU}}(\textbf{x}), f^{\textup{PAFD}}(\textbf{x})]}'\]Population Allele Unavailability (PAU): \(f^{\textup{PAU}}(\textbf{x})\)
\[f^{\textup{PAU}}(\textbf{x}) = \textbf{w} \cdot \textbf{u}\]Given a genotype matrix
mat
and a selection indices vector \(\textbf{x} =\)sel
, calculate the selection allele frequency. From the selection allele frequencies and the target allele frequenciestfreq
, determine if the target frequencies can be attained after unlimited generations of selection. If the target allele frequency at a locus cannot be attained, score locus as1
, otherwise score as0
. Store this into a binary score vector \(\textbf{u}\). Take the dot product between the binary score vector and the marker weight vector \(\textbf{w} =\)mkrwt
to calculate \(f^{\textup{PAU}}(\textbf{x})\) and return the result.Population Allele Frequency Distance (PAFD): \(f^{\textup{PAFD}}(\textbf{x})\)
\[f^{\textup{PAFD}}(\textbf{x}) = \textbf{w} \cdot \left | \textbf{p}_{x} - \textbf{p}_{t} \right |\]Given a genotype matrix
mat
and a selection indices vector \(\textbf{x} =\)sel
, calculate the selection allele frequency \(\textbf{p}_{x}\). From the selection allele frequencies and the target allele frequencies \(\textbf{p}_{t} =\)tfreq
, calculate the absolute value of the difference between the two vectors. Finally, take the dot product between the difference vector and the marker weight vector \(\textbf{w} =\)mkrwt
to calculate \(f^{\textup{PAFD}}(\textbf{x})\) and return the result.Sum of Progeny Standard Deviations of Additive Variance (SPstdA): \(f^{\textup{SPstdA}}(\textbf{x})\)
Formal SPstdA definition:
\[f^{\textup{SPstdA}}(\textbf{x}) = \sum_{c \in S} \sigma_{A,c}\]Given a progeny variance matrix \(\Sigma_{A} =\)
vmat
and a selection indices vector \(\textbf{x} =\)sel
, take the sum of the square root of the progeny variance \(\sigma_{A,c} = \sqrt{\Sigma_{A,c}}\) for each cross.- Parameters:
sel (numpy.ndarray) –
A selection indices matrix of shape
(j,k)
.Where:
j
is the number of configurations to score.k
is the number of individuals to select.
Each index indicates which individuals to select. Each index in
sel
represents a single individual’s row.sel
cannot beNone
.xmap (numpy.ndarray) –
A cross selection index map array of shape
(s,d)
.Where:
s
is the size of the sample space (number of cross combinations ford
parents).d
is the number of parents.
mat (numpy.ndarray) –
A genotype matrix of shape
(n,p)
representing only biallelic loci. One of the two alleles at a locus is coded using a1
. The other allele is coded as a0
.mat
holds the counts of the allele coded by1
.Where:
n
is the number of individuals.p
is the number of markers.
Example:
# matrix of shape (n = 3, p = 4) mat = numpy.array([[0,2,1,0], [2,2,1,1], [0,1,0,2]])
ploidy (int) – Number of phases that the genotype matrix
mat
represents.tfreq (floating, numpy.ndarray) –
A target allele frequency matrix of shape
(p,t)
.Where:
p
is the number of markers.t
is the number of traits.
Example:
tfreq = numpy.array([0.2, 0.6, 0.7, 0.5])
mkrwt (numpy.ndarray) –
A marker weight coefficients matrix of shape
(p,t)
.Where:
p
is the number of markers.t
is the number of traits.
Remarks:
All values in
mkrwt
must be non-negative.
vmat (numpy.ndarray, Matrix) –
A variance matrix of shape
(n,...,n,t)
. Can be anumpy.ndarray
or a Matrix of some sort. Must be have the[]
operator to access elements of the matrix.Where:
n
is the number of parental candidates.t
is the number of traits.(n,...,n,t)
is a tuple of lengthd + 1
.d
is the number of parents for a cross.
trans (function or callable) –
A transformation operator to alter the output. Function must adhere to the following standard:
Must accept a single
numpy.ndarray
argument.Must return a single object, whether scalar or numpy.ndarray.
kwargs (dict) – Dictionary of keyword arguments to pass to
trans
function.
- Returns:
mogm – A MOGM score matrix of shape
(j,t + t + t)
iftrans
isNone
. Otherwise, of shape specified bytrans
.Where:
j
is the number of selection configurations.t
is the number of traits.
Matrix element ordering for un-transformed MOGM score matrix:
The first set of
t
elements in themogm
output correspond to thet
PAU outputs for each trait.The second set of
t
elements in themogm
output correspond to thet
PAFD outputs for each trait.The third set of
t
elements in themogm
output correspond to thet
SPstdA outputs for each trait.
- Return type:
numpy.ndarray
- property objfn_wt: ndarray#
Get data for property objfn_wt.
- pareto(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, miscout=None, **kwargs)[source]#
Calculate a Pareto frontier for objectives.
- Parameters:
pgmat (PhasedGenotypeMatrix) – Genomes
gmat (GenotypeMatrix) – Genotypes
ptdf (pandas.DataFrame) – Phenotype dataframe
bvmat (BreedingValueMatrix) – Breeding value matrix
gpmod (GenomicModel) – Genomic prediction model
t_cur (int) – Current generation number.
t_max (int) – Maximum (deadline) generation number.
miscout (dict, None, default = None) – Pointer to a dictionary for miscellaneous user defined output. If
dict
, write to dict (may overwrite previously defined fields). IfNone
, user defined output is not calculated or stored.kwargs (dict) – Additional keyword arguments.
- Returns:
out – A tuple containing two objects
(frontier, sel_config)
.Where:
frontier
is anumpy.ndarray
of shape(q,v)
containing Pareto frontier points.sel_config
is anumpy.ndarray
of shape(q,k)
containing parent selection decisions for each corresponding point in the Pareto frontier.
Where:
q
is the number of points in the frontier.v
is the number of objectives for the frontier.k
is the number of search space decision variables.
- Return type:
tuple
- property rng: Generator | RandomState#
Get data for property rng.
- select(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, miscout=None, **kwargs)[source]#
Select individuals for breeding.
- Parameters:
pgmat (PhasedGenotypeMatrix) – Genomes
gmat (GenotypeMatrix) – Genotypes (unphased most likely)
ptdf (pandas.DataFrame) – Phenotype dataframe
bvmat (BreedingValueMatrix) – Breeding value matrix
gpmod (GenomicModel) – Genomic prediction model
t_cur (int) – Current generation number.
t_max (int) – Maximum (deadline) generation number.
miscout (dict, None, default = None) – Pointer to a dictionary for miscellaneous user defined output. If
dict
, write to dict (may overwrite previously defined fields). IfNone
, user defined output is not calculated or stored.kwargs (dict) – Additional keyword arguments.
- Returns:
out – A tuple containing four objects:
(pgmat, sel, ncross, nprogeny)
.Where:
pgmat
is a PhasedGenotypeMatrix of parental candidates.sel
is anumpy.ndarray
of indices specifying a cross pattern. Each index corresponds to an individual inpgmat
.ncross
is anumpy.ndarray
specifying the number of crosses to perform per cross pattern.nprogeny
is anumpy.ndarray
specifying the number of progeny to generate per cross.
- Return type:
tuple
- property soalgo: UnconstrainedOptimizationAlgorithm#
Get data for property soalgo.
- property target: ndarray | Callable | str#
Get data for property target.
- property unique_parents: bool#
Get data for property unique_parents.
- property vmatfcty: GeneticVarianceMatrixFactory#
Get data for property vmatfcty.
- property weight: ndarray | Callable | str#
Get data for property weight.