MultiObjectiveGenomicMating#

class pybrops.breed.prot.sel.UnconstrainedMultiObjectiveGenomicMating.MultiObjectiveGenomicMating(nconfig, nparent, ncross, nprogeny, vmatfcty, nself, gmapfn, weight=<function weight_absolute>, target=<function target_positive>, unique_parents=True, mem=1024, method='single', objfn_trans=None, objfn_trans_kwargs=None, objfn_wt=1.0, ndset_trans=None, ndset_trans_kwargs=None, ndset_wt=1.0, rng=None, soalgo=None, moalgo=None, **kwargs)[source]#

Bases: UnconstrainedSelectionProtocol

Class implementing selection protocols for multi-objective genomic mating.

# TODO: add formulae for methodology.

Constructor for MultiObjectiveGenomicSelection class.

Parameters:

nconfig (int) –
Number of cross configurations to consider.

Examples:
- 20 two-way crosses would be: nconfig = 20
- 20 three way crosses would be: nconfig = 20
nparent (int) –
Number of parents to per configuration.

Example:
- 20 two-way crosses would be: nparent = 2
- 20 three-way crosses would be: nparent = 3
ncross (int) – Number of crosses per configuration.
nprogeny (int) – Number of progeny to derive from each cross.
vmatcls (class type) – Variance matrix class name from which to construct additive variance matrices from
s (int) –
Used for ‘vmatcls’ matrix construction. Number of selfing generations post-cross pattern before ‘nprogeny’ individuals are simulated.

Example

Description

nself = 0

Derive gametes from F1

nself = 1

Derive gametes from F2

nself = 2

Derive gametes from F3

...

etc.

nself = inf

Derive gametes from SSD
gmapfn (GeneticMapFunction) – Used for ‘vmatcls’ matrix construction. GeneticMapFunction to use to estimate covariance induced by recombination.
mem (int, default = 1024) –
Used for ‘vmatcls’ matrix construction. Memory chunk size to use during matrix operations. If None, then memory chunk size is not limited.

WARNING: Setting mem = None might result in memory allocation errors! For reference, mem = 1024 refers to a matrix of size 1024x1024, which needs about 8.5 MB of storage. Matrices of course need a quadratic amount of memory: $O (n^{2})$ .
unique_parents (bool, default = True) – Whether to allow force unique parents or not. If True, all parents in the mating configuration must be unique. If False, non-unique parents are allowed. In this scenario, self-fertilization is considered as a viable option.

Example	Description
`nself = 0`	Derive gametes from F1
`nself = 1`	Derive gametes from F2
`nself = 2`	Derive gametes from F3
`...`	etc.
`nself = inf`	Derive gametes from SSD

method (str) –

Method of selecting parents.

Method	Description
`"single"`	MOGM is transformed to a single objective and optimization is done on the transformed function. This is done using the `trans` function provided: optimize : objfn_trans(MOGM)
`"pareto"`	MOGM is transformed by a transformation function, but NOT reduced to a single objective. The Pareto frontier for this transformed function is mapped using a multi-objective genetic algorithm. Objectives are scaled to $[0, 1]$ and a vector orthogonal to the hyperplane defined by the extremes of the front is drawn starting at the point defined by `ndset_trans`. The closest point on the Pareto frontier to the orthogonal vector is selected.

Method

Description

"single"

MOGM is transformed to a single objective and optimization is done on the transformed function. This is done using the trans function provided:

optimize : objfn_trans(MOGM)

"pareto"

MOGM is transformed by a transformation function, but NOT reduced to a single objective. The Pareto frontier for this transformed function is mapped using a multi-objective genetic algorithm.

Objectives are scaled to $[0, 1]$ and a vector orthogonal to the hyperplane defined by the extremes of the front is drawn starting at the point defined by ndset_trans. The closest point on the Pareto frontier to the orthogonal vector is selected.

target (str or numpy.ndarray) –

If target is a string, check value and follow these rules:

Value	Description
`"positive"`	Select alleles with the most positive effect.
`"negative"`	Select alleles with the most negate effect.
`"stabilizing"`	Set target allele frequency to `0.5`.
`numpy.ndarray`	Use frequency values in `target` as is.

weight (str or numpy.ndarray) –
If weight is a string, check value and follow these rules:

Value

Description

"magnitude"

Assign weights using the magnitudes of regression coefficients.

"equal"

Assign weights equally.

Value	Description
`"magnitude"`	Assign weights using the magnitudes of regression coefficients.
`"equal"`	Assign weights equally.

objfn_trans (function, callable) –

Function to transform the MOGM function. If method = “single”, this function must return a scalar. If method = “pareto”, this function must return a numpy.ndarray.

Function definition:

objfn_trans(obj, **kwargs: dict):
    Parameters
        obj : scalar, numpy.ndarray
            Objective scalar or vector to be transformed
        kwargs : dict
            Additional keyword arguments
    Returns
        out : scalar, numpy.ndarray
            Transformed objective scalar or vector.

objfn_trans_kwargs (dict) – Dictionary of keyword arguments to be passed to ‘objfn_trans’.
objfn_wt (float, numpy.ndarray) –
Weight applied to transformed objective function. Indicates whether a function is maximizing or minimizing:
- 1.0 for maximizing function.
- -1.0 for minimizing function.

ndset_trans (numpy.ndarray) –

Function to transform nondominated points along the Pareto frontier into a single score for each point.

Function definition:

ndset_trans(ndset, **kwargs: dict):
    Parameters
        ndset : numpy.ndarray
            Array of shape (j,o) containing nondominated points.
            Where 'j' is the number of nondominated points and
            'o' is the number of objectives.
        kwargs : dict
            Additional keyword arguments.
    Returns
        out : numpy.ndarray
            Array of shape (j,) containing transformed Pareto
            frontier points.

ndset_trans_kwargs (dict) – Dictionary of keyword arguments to be passed to ‘ndset_trans’.
ndset_wt (float) –
Weight applied to transformed nondominated points along Pareto frontier. Indicates whether a function is maximizing or minimizing.

1.0 for maximizing function. -1.0 for minimizing function.
soalgo (OptimizationAlgorithm) –
Single-objective optimization algorithm to optimize the objective function. If None, use a SteepestAscentSetHillClimber with the following parameters:
```
soalgo = SteepestAscentSetHillClimber(
    rng = self.rng  # PRNG source
)
```

moalgo (OptimizationAlgorithm) –

Multi-objective optimization algorithm to optimize the objective functions. If None, use a NSGA2SetGeneticAlgorithm with the following parameters:

moalgo = NSGA2SetGeneticAlgorithm(
    ngen = 250,     # number of generations to evolve
    mu = 100,       # number of parents in population
    lamb = 100,     # number of progeny to produce
    M = 1.5,        # algorithm crossover genetic map length
    rng = self.rng  # PRNG source
)

rng (numpy.random.Generator or None) – A random number generator source. Used for optimization algorithms. If rng is None, use pybrops.core.random module (NOT THREAD SAFE!).

Methods

`objfn`	Return a selection objective function for the provided datasets.
`objfn_static`	Multi-objective genomic mating objective function.
`objfn_vec`	Return a vectorized selection objective function for the provided datasets.
`objfn_vec_static`	A vectorized multi-objective genomic selection objective function.
`pareto`	Calculate a Pareto frontier for objectives.
`select`	Select individuals for breeding.

Attributes

`gmapfn`	Get data for property gmapfn.
`mem`	Get data for property mem.
`method`	Get data for property method.
`moalgo`	Get data for property moalgo.
`nconfig`	Get data for property nconfig.
`ncross`	Get data for property ncross.
`ndset_trans`	Get data for property ndset_trans.
`ndset_trans_kwargs`	Get data for property ndset_trans_kwargs.
`ndset_wt`	Get data for property ndset_wt.
`nparent`	Get data for property nparent.
`nprogeny`	Get data for property nprogeny.
`nself`	Get data for property nself.
`objfn_trans`	Get data for property objfn_trans.
`objfn_trans_kwargs`	Get data for property objfn_trans_kwargs.
`objfn_wt`	Get data for property objfn_wt.
`rng`	Get data for property rng.
`soalgo`	Get data for property soalgo.
`target`	Get data for property target.
`unique_parents`	Get data for property unique_parents.
`vmatfcty`	Get data for property vmatfcty.
`weight`	Get data for property weight.

property gmapfn: GeneticMapFunction#: Get data for property gmapfn.

property mem: int#: Get data for property mem.

property method: str#: Get data for property method.

property moalgo: UnconstrainedOptimizationAlgorithm#: Get data for property moalgo.

property nconfig: int#: Get data for property nconfig.

property ncross: int#: Get data for property ncross.

property ndset_trans: Callable | None#: Get data for property ndset_trans.

property ndset_trans_kwargs: dict#: Get data for property ndset_trans_kwargs.

property ndset_wt: ndarray#: Get data for property ndset_wt.

property nparent: int#: Get data for property nparent.

property nprogeny: int#: Get data for property nprogeny.

property nself: int | Real#: Get data for property nself.

objfn(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, **kwargs)[source]#

Return a selection objective function for the provided datasets.

Parameters:

pgmat (PhasedGenotypeMatrix) – Phased genotype matrix.
gmat (GenotypeMatrix) – Input genotype matrix.
ptdf (pandas.DataFrame) – Not used by this function.
bvmat (BreedingValueMatrix) – Not used by this function.
gpmod (AdditiveLinearGenomicModel) – Linear genomic prediction model.

Returns:

outfn – A selection objective function for the specified problem.

Return type:

function

static objfn_static(sel, xmap, mat, ploidy, tfreq, mkrwt, vmat, trans, kwargs)[source]#

Multi-objective genomic mating objective function.

The goal is to minimize all objectives for this function.
This is a bare bones function. Minimal error checking is done.

Objectives: $F (x)$

F (x) = {[f^{PAU} (x), f^{PAFD} (x)]}^{'}

Population Allele Unavailability (PAU): $f^{PAU} (x)$

Formal PAU definition:

f^{PAU} (x) = w \cdot u

Given a genotype matrix mat and a selection indices vector $x =$ sel, calculate the selection allele frequency. From the selection allele frequencies and the target allele frequencies tfreq, determine if the target frequencies can be attained after unlimited generations of selection. If the target allele frequency at a locus cannot be attained, score locus as 1, otherwise score as 0. Store this into a binary score vector $u$ . Take the dot product between the binary score vector and the marker weight vector $w =$ mkrwt to calculate $f^{PAU} (x)$ and return the result.

Population Allele Frequency Distance (PAFD): $f^{PAFD} (x)$

Formal PAFD definition:

f^{PAFD} (x) = w \cdot | p_{x} - p_{t} |

Given a genotype matrix mat and a selection indices vector $x =$ sel, calculate the selection allele frequency $p_{x}$ . From the selection allele frequencies and the target allele frequencies $p_{t} =$ tfreq, calculate the absolute value of the difference between the two vectors. Finally, take the dot product between the difference vector and the marker weight vector $w =$ mkrwt to calculate $f^{PAFD} (x)$ and return the result.

Sum of Progeny Standard Deviations of Additive Variance (SPstdA): $f^{SPstdA} (x)$

Formal SPstdA definition:

f^{SPstdA} (x) = \sum_{c \in S} σ_{A, c}

Given a progeny variance matrix $Σ_{A} =$ vmat and a selection indices vector $x =$ sel, take the sum of the square root of the progeny variance $σ_{A, c} = \sqrt{Σ_{A, c}}$ for each cross.

Parameters:

sel (numpy.ndarray) –
A cross selection indices matrix of shape (k,).

Where:
- k is the number of crosses to select.
Each index indicates which cross specified by xmap to select.
xmap (numpy.ndarray) –
A cross selection index map array of shape (s,d).

Where:
- s is the size of the sample space (number of cross combinations for d parents).
- d is the number of parents.
mat (numpy.ndarray) –
A genotype matrix of shape (n,p) representing only biallelic loci. One of the two alleles at a locus is coded using a 1. The other allele is coded as a 0. mat holds the counts of the allele coded by 1.

Where:
- n is the number of individuals.
- p is the number of markers.
Example:
```
# matrix of shape (n = 3, p = 4)
mat = numpy.array([[0,2,1,0],
                   [2,2,1,1],
                   [0,1,0,2]])
```
ploidy (int) – Number of phases that the genotype matrix mat represents.
tfreq (floating, numpy.ndarray) –
A target allele frequency matrix of shape (p,t).

Where:
- p is the number of markers.
- t is the number of traits.
Example:
```
tfreq = numpy.array([0.2, 0.6, 0.7, 0.5])
```
mkrwt (numpy.ndarray) –
A marker weight coefficients matrix of shape (p,t).

Where:
- p is the number of markers.
- t is the number of traits.
Remarks:
- All values in mkrwt must be non-negative.
vmat (numpy.ndarray, Matrix) –
A variance matrix of shape (n,...,n,t). Can be a numpy.ndarray or a Matrix of some sort. Must be have the [] operator to access elements of the matrix.

Where:
- n is the number of parental candidates.
- t is the number of traits.
- (n,...,n,t) is a tuple of length d + 1.
- d is the number of parents for a cross.
trans (function or callable) –
A transformation operator to alter the output. Function must adhere to the following standard:
- Must accept a single numpy.ndarray argument.
- Must return a single object, whether scalar or numpy.ndarray.
kwargs (dict) – Dictionary of keyword arguments to pass to trans function.

Returns:

mogm – A MOGM score matrix of shape (t + t + t,) if trans is None. Otherwise, of shape specified by trans.

Where:

t is the number of traits.

Matrix element ordering for un-transformed MOGM score matrix:

The first set of t elements in the mogm output correspond to the t PAU outputs for each trait.
The second set of t elements in the mogm output correspond to the t PAFD outputs for each trait.
The third set of t elements in the mogm output correspond to the t SPstdA outputs for each trait.

Return type:

numpy.ndarray

property objfn_trans: Callable | None#: Get data for property objfn_trans.

property objfn_trans_kwargs: dict#: Get data for property objfn_trans_kwargs.

objfn_vec(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, **kwargs)[source]#

Return a vectorized selection objective function for the provided datasets.

Parameters:

pgmat (PhasedGenotypeMatrix) – Not used by this function.
gmat (GenotypeMatrix) – Input genotype matrix.
ptdf (pandas.DataFrame) – Not used by this function.
bvmat (BreedingValueMatrix) – Not used by this function.
gpmod (AdditiveLinearGenomicModel) – Linear genomic prediction model.

Returns:

outfn – A vectorized selection objective function for the specified problem.

Return type:

function

static objfn_vec_static(sel, xmap, mat, ploidy, tfreq, mkrwt, vmat, trans, kwargs)[source]#

A vectorized multi-objective genomic selection objective function.

The goal is to minimize all objectives for this function.
This is a bare bones function. Minimal error checking is done.

Objectives: $F (x)$

F (x) = {[f^{PAU} (x), f^{PAFD} (x)]}^{'}

Population Allele Unavailability (PAU): $f^{PAU} (x)$

f^{PAU} (x) = w \cdot u

Population Allele Frequency Distance (PAFD): $f^{PAFD} (x)$

f^{PAFD} (x) = w \cdot | p_{x} - p_{t} |

Sum of Progeny Standard Deviations of Additive Variance (SPstdA): $f^{SPstdA} (x)$

Formal SPstdA definition:

f^{SPstdA} (x) = \sum_{c \in S} σ_{A, c}

Given a progeny variance matrix $Σ_{A} =$ vmat and a selection indices vector $x =$ sel, take the sum of the square root of the progeny variance $σ_{A, c} = \sqrt{Σ_{A, c}}$ for each cross.

Parameters:

sel (numpy.ndarray) –
A selection indices matrix of shape (j,k).

Where:
- j is the number of configurations to score.
- k is the number of individuals to select.
Each index indicates which individuals to select. Each index in sel represents a single individual’s row. sel cannot be None.
xmap (numpy.ndarray) –
A cross selection index map array of shape (s,d).

Where:
- s is the size of the sample space (number of cross combinations for d parents).
- d is the number of parents.
mat (numpy.ndarray) –
A genotype matrix of shape (n,p) representing only biallelic loci. One of the two alleles at a locus is coded using a 1. The other allele is coded as a 0. mat holds the counts of the allele coded by 1.

Where:
- n is the number of individuals.
- p is the number of markers.
Example:
```
# matrix of shape (n = 3, p = 4)
mat = numpy.array([[0,2,1,0],
                   [2,2,1,1],
                   [0,1,0,2]])
```
ploidy (int) – Number of phases that the genotype matrix mat represents.
tfreq (floating, numpy.ndarray) –
A target allele frequency matrix of shape (p,t).

Where:
- p is the number of markers.
- t is the number of traits.
Example:
```
tfreq = numpy.array([0.2, 0.6, 0.7, 0.5])
```
mkrwt (numpy.ndarray) –
A marker weight coefficients matrix of shape (p,t).

Where:
- p is the number of markers.
- t is the number of traits.
Remarks:
- All values in mkrwt must be non-negative.
vmat (numpy.ndarray, Matrix) –
A variance matrix of shape (n,...,n,t). Can be a numpy.ndarray or a Matrix of some sort. Must be have the [] operator to access elements of the matrix.

Where:
- n is the number of parental candidates.
- t is the number of traits.
- (n,...,n,t) is a tuple of length d + 1.
- d is the number of parents for a cross.
trans (function or callable) –
A transformation operator to alter the output. Function must adhere to the following standard:
- Must accept a single numpy.ndarray argument.
- Must return a single object, whether scalar or numpy.ndarray.
kwargs (dict) – Dictionary of keyword arguments to pass to trans function.

Returns:

mogm – A MOGM score matrix of shape (j,t + t + t) if trans is None. Otherwise, of shape specified by trans.

Where:

j is the number of selection configurations.
t is the number of traits.

Matrix element ordering for un-transformed MOGM score matrix:

The first set of t elements in the mogm output correspond to the t PAU outputs for each trait.
The second set of t elements in the mogm output correspond to the t PAFD outputs for each trait.
The third set of t elements in the mogm output correspond to the t SPstdA outputs for each trait.

Return type:

numpy.ndarray

property objfn_wt: ndarray#: Get data for property objfn_wt.

pareto(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, miscout=None, **kwargs)[source]#

Calculate a Pareto frontier for objectives.

Parameters:

pgmat (PhasedGenotypeMatrix) – Genomes
gmat (GenotypeMatrix) – Genotypes
ptdf (pandas.DataFrame) – Phenotype dataframe
bvmat (BreedingValueMatrix) – Breeding value matrix
gpmod (GenomicModel) – Genomic prediction model
t_cur (int) – Current generation number.
t_max (int) – Maximum (deadline) generation number.
miscout (dict, None, default = None) – Pointer to a dictionary for miscellaneous user defined output. If dict, write to dict (may overwrite previously defined fields). If None, user defined output is not calculated or stored.
kwargs (dict) – Additional keyword arguments.

Returns:

out – A tuple containing two objects (frontier, sel_config).

Where:

frontier is a numpy.ndarray of shape (q,v) containing Pareto frontier points.
sel_config is a numpy.ndarray of shape (q,k) containing parent selection decisions for each corresponding point in the Pareto frontier.

Where:

q is the number of points in the frontier.
v is the number of objectives for the frontier.
k is the number of search space decision variables.

Return type:

tuple

property rng: Generator | RandomState#: Get data for property rng.

select(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, miscout=None, **kwargs)[source]#

Select individuals for breeding.

Parameters:

pgmat (PhasedGenotypeMatrix) – Genomes
gmat (GenotypeMatrix) – Genotypes (unphased most likely)
ptdf (pandas.DataFrame) – Phenotype dataframe
bvmat (BreedingValueMatrix) – Breeding value matrix
gpmod (GenomicModel) – Genomic prediction model
t_cur (int) – Current generation number.
t_max (int) – Maximum (deadline) generation number.
miscout (dict, None, default = None) – Pointer to a dictionary for miscellaneous user defined output. If dict, write to dict (may overwrite previously defined fields). If None, user defined output is not calculated or stored.
kwargs (dict) – Additional keyword arguments.

Returns:

out – A tuple containing four objects: (pgmat, sel, ncross, nprogeny).

Where:

pgmat is a PhasedGenotypeMatrix of parental candidates.
sel is a numpy.ndarray of indices specifying a cross pattern. Each index corresponds to an individual in pgmat.
ncross is a numpy.ndarray specifying the number of crosses to perform per cross pattern.
nprogeny is a numpy.ndarray specifying the number of progeny to generate per cross.

Return type:

tuple

property soalgo: UnconstrainedOptimizationAlgorithm#: Get data for property soalgo.

property target: ndarray | Callable | str#: Get data for property target.

property unique_parents: bool#: Get data for property unique_parents.

property vmatfcty: GeneticVarianceMatrixFactory#: Get data for property vmatfcty.

property weight: ndarray | Callable | str#: Get data for property weight.