MultiObjectiveGenomicMating#

class pybrops.breed.prot.sel.UnconstrainedMultiObjectiveGenomicMating.MultiObjectiveGenomicMating(nconfig, nparent, ncross, nprogeny, vmatfcty, nself, gmapfn, weight=<function weight_absolute>, target=<function target_positive>, unique_parents=True, mem=1024, method='single', objfn_trans=None, objfn_trans_kwargs=None, objfn_wt=1.0, ndset_trans=None, ndset_trans_kwargs=None, ndset_wt=1.0, rng=None, soalgo=None, moalgo=None, **kwargs)[source]#

Bases: UnconstrainedSelectionProtocol

Class implementing selection protocols for multi-objective genomic mating.

# TODO: add formulae for methodology.

Constructor for MultiObjectiveGenomicSelection class.

Parameters:
  • nconfig (int) –

    Number of cross configurations to consider.

    Examples:

    • 20 two-way crosses would be: nconfig = 20

    • 20 three way crosses would be: nconfig = 20

  • nparent (int) –

    Number of parents to per configuration.

    Example:

    • 20 two-way crosses would be: nparent = 2

    • 20 three-way crosses would be: nparent = 3

  • ncross (int) – Number of crosses per configuration.

  • nprogeny (int) – Number of progeny to derive from each cross.

  • vmatcls (class type) – Variance matrix class name from which to construct additive variance matrices from

  • s (int) –

    Used for ‘vmatcls’ matrix construction. Number of selfing generations post-cross pattern before ‘nprogeny’ individuals are simulated.

    Example

    Description

    nself = 0

    Derive gametes from F1

    nself = 1

    Derive gametes from F2

    nself = 2

    Derive gametes from F3

    ...

    etc.

    nself = inf

    Derive gametes from SSD

  • gmapfn (GeneticMapFunction) – Used for ‘vmatcls’ matrix construction. GeneticMapFunction to use to estimate covariance induced by recombination.

  • mem (int, default = 1024) –

    Used for ‘vmatcls’ matrix construction. Memory chunk size to use during matrix operations. If None, then memory chunk size is not limited.

    WARNING: Setting mem = None might result in memory allocation errors! For reference, mem = 1024 refers to a matrix of size 1024x1024, which needs about 8.5 MB of storage. Matrices of course need a quadratic amount of memory: \(O(n^2)\).

  • unique_parents (bool, default = True) – Whether to allow force unique parents or not. If True, all parents in the mating configuration must be unique. If False, non-unique parents are allowed. In this scenario, self-fertilization is considered as a viable option.

  • method (str) –

    Method of selecting parents.

    Method

    Description

    "single"

    MOGM is transformed to a single objective and optimization is done on the transformed function. This is done using the trans function provided:

    optimize : objfn_trans(MOGM)
    

    "pareto"

    MOGM is transformed by a transformation function, but NOT reduced to a single objective. The Pareto frontier for this transformed function is mapped using a multi-objective genetic algorithm.

    Objectives are scaled to \([0,1]\) and a vector orthogonal to the hyperplane defined by the extremes of the front is drawn starting at the point defined by ndset_trans. The closest point on the Pareto frontier to the orthogonal vector is selected.

  • target (str or numpy.ndarray) –

    If target is a string, check value and follow these rules:

    Value

    Description

    "positive"

    Select alleles with the most positive effect.

    "negative"

    Select alleles with the most negate effect.

    "stabilizing"

    Set target allele frequency to 0.5.

    numpy.ndarray

    Use frequency values in target as is.

  • weight (str or numpy.ndarray) –

    If weight is a string, check value and follow these rules:

    Value

    Description

    "magnitude"

    Assign weights using the magnitudes of regression coefficients.

    "equal"

    Assign weights equally.

  • objfn_trans (function, callable) –

    Function to transform the MOGM function. If method = “single”, this function must return a scalar. If method = “pareto”, this function must return a numpy.ndarray.

    Function definition:

    objfn_trans(obj, **kwargs: dict):
        Parameters
            obj : scalar, numpy.ndarray
                Objective scalar or vector to be transformed
            kwargs : dict
                Additional keyword arguments
        Returns
            out : scalar, numpy.ndarray
                Transformed objective scalar or vector.
    

  • objfn_trans_kwargs (dict) – Dictionary of keyword arguments to be passed to ‘objfn_trans’.

  • objfn_wt (float, numpy.ndarray) –

    Weight applied to transformed objective function. Indicates whether a function is maximizing or minimizing:

    • 1.0 for maximizing function.

    • -1.0 for minimizing function.

  • ndset_trans (numpy.ndarray) –

    Function to transform nondominated points along the Pareto frontier into a single score for each point.

    Function definition:

    ndset_trans(ndset, **kwargs: dict):
        Parameters
            ndset : numpy.ndarray
                Array of shape (j,o) containing nondominated points.
                Where 'j' is the number of nondominated points and
                'o' is the number of objectives.
            kwargs : dict
                Additional keyword arguments.
        Returns
            out : numpy.ndarray
                Array of shape (j,) containing transformed Pareto
                frontier points.
    

  • ndset_trans_kwargs (dict) – Dictionary of keyword arguments to be passed to ‘ndset_trans’.

  • ndset_wt (float) –

    Weight applied to transformed nondominated points along Pareto frontier. Indicates whether a function is maximizing or minimizing.

    1.0 for maximizing function. -1.0 for minimizing function.

  • soalgo (OptimizationAlgorithm) –

    Single-objective optimization algorithm to optimize the objective function. If None, use a SteepestAscentSetHillClimber with the following parameters:

    soalgo = SteepestAscentSetHillClimber(
        rng = self.rng  # PRNG source
    )
    

  • moalgo (OptimizationAlgorithm) –

    Multi-objective optimization algorithm to optimize the objective functions. If None, use a NSGA2SetGeneticAlgorithm with the following parameters:

    moalgo = NSGA2SetGeneticAlgorithm(
        ngen = 250,     # number of generations to evolve
        mu = 100,       # number of parents in population
        lamb = 100,     # number of progeny to produce
        M = 1.5,        # algorithm crossover genetic map length
        rng = self.rng  # PRNG source
    )
    

  • rng (numpy.random.Generator or None) – A random number generator source. Used for optimization algorithms. If rng is None, use pybrops.core.random module (NOT THREAD SAFE!).

Methods

objfn

Return a selection objective function for the provided datasets.

objfn_static

Multi-objective genomic mating objective function.

objfn_vec

Return a vectorized selection objective function for the provided datasets.

objfn_vec_static

A vectorized multi-objective genomic selection objective function.

pareto

Calculate a Pareto frontier for objectives.

select

Select individuals for breeding.

Attributes

gmapfn

Get data for property gmapfn.

mem

Get data for property mem.

method

Get data for property method.

moalgo

Get data for property moalgo.

nconfig

Get data for property nconfig.

ncross

Get data for property ncross.

ndset_trans

Get data for property ndset_trans.

ndset_trans_kwargs

Get data for property ndset_trans_kwargs.

ndset_wt

Get data for property ndset_wt.

nparent

Get data for property nparent.

nprogeny

Get data for property nprogeny.

nself

Get data for property nself.

objfn_trans

Get data for property objfn_trans.

objfn_trans_kwargs

Get data for property objfn_trans_kwargs.

objfn_wt

Get data for property objfn_wt.

rng

Get data for property rng.

soalgo

Get data for property soalgo.

target

Get data for property target.

unique_parents

Get data for property unique_parents.

vmatfcty

Get data for property vmatfcty.

weight

Get data for property weight.

property gmapfn: GeneticMapFunction#

Get data for property gmapfn.

property mem: int#

Get data for property mem.

property method: str#

Get data for property method.

property moalgo: UnconstrainedOptimizationAlgorithm#

Get data for property moalgo.

property nconfig: int#

Get data for property nconfig.

property ncross: int#

Get data for property ncross.

property ndset_trans: Callable | None#

Get data for property ndset_trans.

property ndset_trans_kwargs: dict#

Get data for property ndset_trans_kwargs.

property ndset_wt: ndarray#

Get data for property ndset_wt.

property nparent: int#

Get data for property nparent.

property nprogeny: int#

Get data for property nprogeny.

property nself: int | Real#

Get data for property nself.

objfn(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, **kwargs)[source]#

Return a selection objective function for the provided datasets.

Parameters:
Returns:

outfn – A selection objective function for the specified problem.

Return type:

function

static objfn_static(sel, xmap, mat, ploidy, tfreq, mkrwt, vmat, trans, kwargs)[source]#

Multi-objective genomic mating objective function.

  • The goal is to minimize all objectives for this function.

  • This is a bare bones function. Minimal error checking is done.

Objectives: \(F(\textbf{x})\)

\[F(\textbf{x}) = {[f^{\textup{PAU}}(\textbf{x}), f^{\textup{PAFD}}(\textbf{x})]}'\]

Population Allele Unavailability (PAU): \(f^{\textup{PAU}}(\textbf{x})\)

Formal PAU definition:

\[f^{\textup{PAU}}(\textbf{x}) = \textbf{w} \cdot \textbf{u}\]

Given a genotype matrix mat and a selection indices vector \(\textbf{x} =\) sel, calculate the selection allele frequency. From the selection allele frequencies and the target allele frequencies tfreq, determine if the target frequencies can be attained after unlimited generations of selection. If the target allele frequency at a locus cannot be attained, score locus as 1, otherwise score as 0. Store this into a binary score vector \(\textbf{u}\). Take the dot product between the binary score vector and the marker weight vector \(\textbf{w} =\) mkrwt to calculate \(f^{\textup{PAU}}(\textbf{x})\) and return the result.

Population Allele Frequency Distance (PAFD): \(f^{\textup{PAFD}}(\textbf{x})\)

Formal PAFD definition:

\[f^{\textup{PAFD}}(\textbf{x}) = \textbf{w} \cdot \left | \textbf{p}_{x} - \textbf{p}_{t} \right |\]

Given a genotype matrix mat and a selection indices vector \(\textbf{x} =\) sel, calculate the selection allele frequency \(\textbf{p}_{x}\). From the selection allele frequencies and the target allele frequencies \(\textbf{p}_{t} =\) tfreq, calculate the absolute value of the difference between the two vectors. Finally, take the dot product between the difference vector and the marker weight vector \(\textbf{w} =\) mkrwt to calculate \(f^{\textup{PAFD}}(\textbf{x})\) and return the result.

Sum of Progeny Standard Deviations of Additive Variance (SPstdA): \(f^{\textup{SPstdA}}(\textbf{x})\)

Formal SPstdA definition:

\[f^{\textup{SPstdA}}(\textbf{x}) = \sum_{c \in S} \sigma_{A,c}\]

Given a progeny variance matrix \(\Sigma_{A} =\) vmat and a selection indices vector \(\textbf{x} =\) sel, take the sum of the square root of the progeny variance \(\sigma_{A,c} = \sqrt{\Sigma_{A,c}}\) for each cross.

Parameters:
  • sel (numpy.ndarray) –

    A cross selection indices matrix of shape (k,).

    Where:

    • k is the number of crosses to select.

    Each index indicates which cross specified by xmap to select.

  • xmap (numpy.ndarray) –

    A cross selection index map array of shape (s,d).

    Where:

    • s is the size of the sample space (number of cross combinations for d parents).

    • d is the number of parents.

  • mat (numpy.ndarray) –

    A genotype matrix of shape (n,p) representing only biallelic loci. One of the two alleles at a locus is coded using a 1. The other allele is coded as a 0. mat holds the counts of the allele coded by 1.

    Where:

    • n is the number of individuals.

    • p is the number of markers.

    Example:

    # matrix of shape (n = 3, p = 4)
    mat = numpy.array([[0,2,1,0],
                       [2,2,1,1],
                       [0,1,0,2]])
    

  • ploidy (int) – Number of phases that the genotype matrix mat represents.

  • tfreq (floating, numpy.ndarray) –

    A target allele frequency matrix of shape (p,t).

    Where:

    • p is the number of markers.

    • t is the number of traits.

    Example:

    tfreq = numpy.array([0.2, 0.6, 0.7, 0.5])
    

  • mkrwt (numpy.ndarray) –

    A marker weight coefficients matrix of shape (p,t).

    Where:

    • p is the number of markers.

    • t is the number of traits.

    Remarks:

    • All values in mkrwt must be non-negative.

  • vmat (numpy.ndarray, Matrix) –

    A variance matrix of shape (n,...,n,t). Can be a numpy.ndarray or a Matrix of some sort. Must be have the [] operator to access elements of the matrix.

    Where:

    • n is the number of parental candidates.

    • t is the number of traits.

    • (n,...,n,t) is a tuple of length d + 1.

    • d is the number of parents for a cross.

  • trans (function or callable) –

    A transformation operator to alter the output. Function must adhere to the following standard:

    • Must accept a single numpy.ndarray argument.

    • Must return a single object, whether scalar or numpy.ndarray.

  • kwargs (dict) – Dictionary of keyword arguments to pass to trans function.

Returns:

mogm – A MOGM score matrix of shape (t + t + t,) if trans is None. Otherwise, of shape specified by trans.

Where:

  • t is the number of traits.

Matrix element ordering for un-transformed MOGM score matrix:

  • The first set of t elements in the mogm output correspond to the t PAU outputs for each trait.

  • The second set of t elements in the mogm output correspond to the t PAFD outputs for each trait.

  • The third set of t elements in the mogm output correspond to the t SPstdA outputs for each trait.

Return type:

numpy.ndarray

property objfn_trans: Callable | None#

Get data for property objfn_trans.

property objfn_trans_kwargs: dict#

Get data for property objfn_trans_kwargs.

objfn_vec(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, **kwargs)[source]#

Return a vectorized selection objective function for the provided datasets.

Parameters:
Returns:

outfn – A vectorized selection objective function for the specified problem.

Return type:

function

static objfn_vec_static(sel, xmap, mat, ploidy, tfreq, mkrwt, vmat, trans, kwargs)[source]#

A vectorized multi-objective genomic selection objective function.

  • The goal is to minimize all objectives for this function.

  • This is a bare bones function. Minimal error checking is done.

Objectives: \(F(\textbf{x})\)

\[F(\textbf{x}) = {[f^{\textup{PAU}}(\textbf{x}), f^{\textup{PAFD}}(\textbf{x})]}'\]

Population Allele Unavailability (PAU): \(f^{\textup{PAU}}(\textbf{x})\)

\[f^{\textup{PAU}}(\textbf{x}) = \textbf{w} \cdot \textbf{u}\]

Given a genotype matrix mat and a selection indices vector \(\textbf{x} =\) sel, calculate the selection allele frequency. From the selection allele frequencies and the target allele frequencies tfreq, determine if the target frequencies can be attained after unlimited generations of selection. If the target allele frequency at a locus cannot be attained, score locus as 1, otherwise score as 0. Store this into a binary score vector \(\textbf{u}\). Take the dot product between the binary score vector and the marker weight vector \(\textbf{w} =\) mkrwt to calculate \(f^{\textup{PAU}}(\textbf{x})\) and return the result.

Population Allele Frequency Distance (PAFD): \(f^{\textup{PAFD}}(\textbf{x})\)

\[f^{\textup{PAFD}}(\textbf{x}) = \textbf{w} \cdot \left | \textbf{p}_{x} - \textbf{p}_{t} \right |\]

Given a genotype matrix mat and a selection indices vector \(\textbf{x} =\) sel, calculate the selection allele frequency \(\textbf{p}_{x}\). From the selection allele frequencies and the target allele frequencies \(\textbf{p}_{t} =\) tfreq, calculate the absolute value of the difference between the two vectors. Finally, take the dot product between the difference vector and the marker weight vector \(\textbf{w} =\) mkrwt to calculate \(f^{\textup{PAFD}}(\textbf{x})\) and return the result.

Sum of Progeny Standard Deviations of Additive Variance (SPstdA): \(f^{\textup{SPstdA}}(\textbf{x})\)

Formal SPstdA definition:

\[f^{\textup{SPstdA}}(\textbf{x}) = \sum_{c \in S} \sigma_{A,c}\]

Given a progeny variance matrix \(\Sigma_{A} =\) vmat and a selection indices vector \(\textbf{x} =\) sel, take the sum of the square root of the progeny variance \(\sigma_{A,c} = \sqrt{\Sigma_{A,c}}\) for each cross.

Parameters:
  • sel (numpy.ndarray) –

    A selection indices matrix of shape (j,k).

    Where:

    • j is the number of configurations to score.

    • k is the number of individuals to select.

    Each index indicates which individuals to select. Each index in sel represents a single individual’s row. sel cannot be None.

  • xmap (numpy.ndarray) –

    A cross selection index map array of shape (s,d).

    Where:

    • s is the size of the sample space (number of cross combinations for d parents).

    • d is the number of parents.

  • mat (numpy.ndarray) –

    A genotype matrix of shape (n,p) representing only biallelic loci. One of the two alleles at a locus is coded using a 1. The other allele is coded as a 0. mat holds the counts of the allele coded by 1.

    Where:

    • n is the number of individuals.

    • p is the number of markers.

    Example:

    # matrix of shape (n = 3, p = 4)
    mat = numpy.array([[0,2,1,0],
                       [2,2,1,1],
                       [0,1,0,2]])
    

  • ploidy (int) – Number of phases that the genotype matrix mat represents.

  • tfreq (floating, numpy.ndarray) –

    A target allele frequency matrix of shape (p,t).

    Where:

    • p is the number of markers.

    • t is the number of traits.

    Example:

    tfreq = numpy.array([0.2, 0.6, 0.7, 0.5])
    

  • mkrwt (numpy.ndarray) –

    A marker weight coefficients matrix of shape (p,t).

    Where:

    • p is the number of markers.

    • t is the number of traits.

    Remarks:

    • All values in mkrwt must be non-negative.

  • vmat (numpy.ndarray, Matrix) –

    A variance matrix of shape (n,...,n,t). Can be a numpy.ndarray or a Matrix of some sort. Must be have the [] operator to access elements of the matrix.

    Where:

    • n is the number of parental candidates.

    • t is the number of traits.

    • (n,...,n,t) is a tuple of length d + 1.

    • d is the number of parents for a cross.

  • trans (function or callable) –

    A transformation operator to alter the output. Function must adhere to the following standard:

    • Must accept a single numpy.ndarray argument.

    • Must return a single object, whether scalar or numpy.ndarray.

  • kwargs (dict) – Dictionary of keyword arguments to pass to trans function.

Returns:

mogm – A MOGM score matrix of shape (j,t + t + t) if trans is None. Otherwise, of shape specified by trans.

Where:

  • j is the number of selection configurations.

  • t is the number of traits.

Matrix element ordering for un-transformed MOGM score matrix:

  • The first set of t elements in the mogm output correspond to the t PAU outputs for each trait.

  • The second set of t elements in the mogm output correspond to the t PAFD outputs for each trait.

  • The third set of t elements in the mogm output correspond to the t SPstdA outputs for each trait.

Return type:

numpy.ndarray

property objfn_wt: ndarray#

Get data for property objfn_wt.

pareto(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, miscout=None, **kwargs)[source]#

Calculate a Pareto frontier for objectives.

Parameters:
  • pgmat (PhasedGenotypeMatrix) – Genomes

  • gmat (GenotypeMatrix) – Genotypes

  • ptdf (pandas.DataFrame) – Phenotype dataframe

  • bvmat (BreedingValueMatrix) – Breeding value matrix

  • gpmod (GenomicModel) – Genomic prediction model

  • t_cur (int) – Current generation number.

  • t_max (int) – Maximum (deadline) generation number.

  • miscout (dict, None, default = None) – Pointer to a dictionary for miscellaneous user defined output. If dict, write to dict (may overwrite previously defined fields). If None, user defined output is not calculated or stored.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A tuple containing two objects (frontier, sel_config).

Where:

  • frontier is a numpy.ndarray of shape (q,v) containing Pareto frontier points.

  • sel_config is a numpy.ndarray of shape (q,k) containing parent selection decisions for each corresponding point in the Pareto frontier.

Where:

  • q is the number of points in the frontier.

  • v is the number of objectives for the frontier.

  • k is the number of search space decision variables.

Return type:

tuple

property rng: Generator | RandomState#

Get data for property rng.

select(pgmat, gmat, ptdf, bvmat, gpmod, t_cur, t_max, miscout=None, **kwargs)[source]#

Select individuals for breeding.

Parameters:
  • pgmat (PhasedGenotypeMatrix) – Genomes

  • gmat (GenotypeMatrix) – Genotypes (unphased most likely)

  • ptdf (pandas.DataFrame) – Phenotype dataframe

  • bvmat (BreedingValueMatrix) – Breeding value matrix

  • gpmod (GenomicModel) – Genomic prediction model

  • t_cur (int) – Current generation number.

  • t_max (int) – Maximum (deadline) generation number.

  • miscout (dict, None, default = None) – Pointer to a dictionary for miscellaneous user defined output. If dict, write to dict (may overwrite previously defined fields). If None, user defined output is not calculated or stored.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A tuple containing four objects: (pgmat, sel, ncross, nprogeny).

Where:

  • pgmat is a PhasedGenotypeMatrix of parental candidates.

  • sel is a numpy.ndarray of indices specifying a cross pattern. Each index corresponds to an individual in pgmat.

  • ncross is a numpy.ndarray specifying the number of crosses to perform per cross pattern.

  • nprogeny is a numpy.ndarray specifying the number of progeny to generate per cross.

Return type:

tuple

property soalgo: UnconstrainedOptimizationAlgorithm#

Get data for property soalgo.

property target: ndarray | Callable | str#

Get data for property target.

property unique_parents: bool#

Get data for property unique_parents.

property vmatfcty: GeneticVarianceMatrixFactory#

Get data for property vmatfcty.

property weight: ndarray | Callable | str#

Get data for property weight.