rrBLUPModel0#

class pybrops.model.gmod.rrBLUPModel0.rrBLUPModel0(beta, u_misc, u_a, trait=None, method='ML', model_name=None, hyperparams=None, **kwargs)[source]#

Bases: DenseAdditiveLinearGenomicModel

The rrBLUPModel0 class represents a simple RR-BLUP model with an intercept (fixed) and marker effects (random).

An rrBLUPModel0 is a Multivariate Multiple Linear Regression model defined as:

\[\mathbf{Y} = \mathbf{XB} + \mathbf{ZU} + \mathbf{E}\]

Where:

  • \(\mathbf{Y}\) is a matrix of response variables of shape (n,t).

  • \(\mathbf{X}\) is a matrix of ones for the intercept of shape (n,1).

  • \(\mathbf{B}\) is a matrix of intercept coefficients of shape (1,t).

  • \(\mathbf{Z}\) is a matrix of random effect predictors of shape (n,p).

  • \(\mathbf{U}\) is a matrix of random effect regression coefficients of shape (p,t).

  • \(\mathbf{E}\) is a matrix of error terms of shape (n,t).

Block matrix modifications to :

\(\mathbf{Z}\) and \(\mathbf{U}\) can be decomposed into block matrices pertaining to different sets of effects:

\[\mathbf{Z} = \begin{bmatrix} \mathbf{Z_{misc}} & \mathbf{Z_{a}} \end{bmatrix}\]

Where:

  • \(\mathbf{Z_{misc}}\) is a matrix of miscellaneous random effect predictors of shape (n,p_misc)

  • \(\mathbf{Z_{a}}\) is a matrix of additive genomic marker predictors of shape (n,p_a)

\[\begin{split}\mathbf{U} = \begin{bmatrix} \mathbf{U_{misc}} \\ \mathbf{U_{a}} \end{bmatrix}\end{split}\]

Where:

  • \(\mathbf{U_{misc}}\) is a matrix of miscellaneous random effects of shape (p_misc,t)

  • \(\mathbf{U_{a}}\) is a matrix of additive genomic marker effects of shape (p_a,t)

Shape definitions:

  • n is the number of individuals

  • p is the number of random effect predictors.

  • p_misc is the number of miscellaneous random effect predictors.

  • p_a is the number of additive genomic marker predictors.

  • The sum of p_misc and p_a equals p.

  • t is the number of traits

From Prototype class docstring:

RR-BLUP model for fitting a single random effect and a single intercept fixed effect for a single trait. If multiple traits are provided, fit independent models for each trait.

For a single trait, the model is:

y = Xb + Zu + e

Where:

- ``y`` are observations.
- ``X`` is a matrix of ones for the incidence of the slope.
- ``b`` is the intercept.
- ``Z`` is a design matrix for genetic markers.
- ``u`` are marker effects which follow the distribution ``MVN(0, varU * I)``.
- ``e`` are errors which follow the distribution ``MVN(0, varE * I)``.

For a single trait, if the observations (y) are mean centered, then the model becomes:

y = Zu + e

Where:

- ``y`` are observations.
- ``Z`` is a design matrix for genetic markers.
- ``u`` are marker effects which follow the distribution ``MVN(0, varU * I)``.
- ``e`` are errors which follow the distribution ``MVN(0, varE * I)``.

Multiple traits are concatenated together into the model:

Y = XB + ZU + E

Constructor for rrBLUPModel0 class.

Parameters:
  • beta (numpy.ndarray) –

    A float64 fixed effect regression coefficient matrix of shape (q,t).

    Where:

    • q is the number of fixed effect predictors (e.g. environments).

    • t is the number of traits.

  • u_misc (numpy.ndarray, None) –

    A float64 random effect regression coefficient matrix of shape (p_misc,t) containing miscellaneous effects.

    Where:

    • p_misc is the number of miscellaneous random effect predictors.

    • t is the number of traits.

    If None, then set to an empty array of shape (0,t).

  • u_a (numpy.ndarray, None) –

    A float64 random effect regression coefficient matrix of shape (p_a,t) containing additive marker effects.

    Where:

    • p_a is the number of additive marker effect predictors.

    • t is the number of traits.

    If None, then set to an empty array of shape (0,t).

  • trait (numpy.ndarray, None) –

    An object array of shape (t,) containing the names of traits.

    Where:

    • t is the number of traits.

  • method (str) – Fitting method to use. Options are {"ML"}.

  • model_name (str, None) – Name of the model.

  • hyperparams (dict, None) – Model parameters.

  • kwargs (dict) – Used for cooperative inheritance. Dictionary passing unused arguments to the parent class constructor.

Methods

bulmer

Calculate the Bulmer effect.

bulmer_numpy

Calculate the Bulmer effect.

copy

Make a shallow copy of the GenomicModel.

daavail

Determine whether a deleterious allele is available in the present taxa.

dacount

Calculate the deleterious allele count across all taxa.

dafixed

Determine whether a deleterious allele is fixed across all taxa.

dafreq

Calculate the deleterious allele frequency across all taxa.

dapoly

Determine whether a deleterious allele is polymorphic across all taxa.

deepcopy

Make a deep copy of the GenomicModel.

faavail

Determine whether a favorable allele is polymorphic or fixed across all taxa.

facount

Calculate the favorable allele count across all taxa.

fafixed

Determine whether a favorable allele is fixed across all taxa.

fafreq

Calculate the favorable allele frequency across all taxa.

fapoly

Determine whether a favorable allele is polymorphic across all taxa.

fit

Fit a dense, additive linear genomic model.

fit_numpy

Fit a dense, additive linear genomic model.

from_csv_dict

Read a DenseAdditiveLinearGenomicModel from a set of CSV files specified by values in a dict.

from_hdf5

Read DenseAdditiveLinearGenomicModel from an HDF5 file.

from_pandas_dict

Read an object from a dict of pandas.DataFrame.

gebv

Calculate genomic estimated breeding values.

gebv_numpy

Calculate genomic estimated breeding values.

gegv

Calculate genomic estimated genotypic values.

gegv_numpy

Calculate genomic estimated genotypic values.

lsl

Calculate the lower selection limit for a population.

lsl_numpy

Calculate the lower selection limit for a population.

nafixed

Determine whether a neutral allele is fixed across all taxa.

napoly

Determine whether a neutral allele is polymorphic across all taxa.

predict

Predict breeding values.

predict_numpy

Predict breeding values.

score

Return the coefficient of determination R**2 of the prediction.

score_numpy

Return the coefficient of determination R**2 of the prediction.

to_csv_dict

Export a DenseAdditiveLinearGenomicModel to a set of CSV files specified by values in a dict.

to_hdf5

Write DenseAdditiveLinearGenomicModel to an HDF5 file.

to_pandas_dict

Export a DenseAdditiveLinearGenomicModel to a dict of pandas.DataFrame.

usl

Calculate the upper selection limit for a population.

usl_numpy

Calculate the upper selection limit for a population.

var_A

Calculate the population additive genetic variance

var_A_numpy

Calculate the population additive genetic variance

var_G

Calculate the population genetic variance.

var_G_numpy

Calculate the population genetic variance.

var_a

Calculate the population additive genic variance

var_a_numpy

Calculate the population additive genic variance

Attributes

beta

Fixed effect regression coefficients.

hyperparams

Description for property hyperparams.

method

Method to be used to fit the RR-BLUP model.

model_name

Description for property model_name.

nexplan

Number of explanatory variables required by the model.

nexplan_beta

Number of fixed effect explanatory variables required by the model.

nexplan_u

Number of random effect explanatory variables required by the model.

nexplan_u_a

Number of additive genomic marker explanatory variables required by the model.

nexplan_u_misc

Number of miscellaneous random effect explanatory variables required by the model.

nparam

Number of model parameters.

nparam_beta

Number of fixed effect parameters.

nparam_u

Number of random effect parameters.

nparam_u_a

Number of additive genomic marker parameters.

nparam_u_misc

Number of miscellaneous random effect parameters.

ntrait

Number of traits predicted by the model.

trait

Description for property trait.

u

Random effect regression coefficients.

u_a

Additive genomic marker effects.

u_misc

Miscellaneous random effect regression coefficients.

property beta: ndarray#

Fixed effect regression coefficients.

bulmer(gtobj, ploidy=None, **kwargs)#

Calculate the Bulmer effect.

Parameters:
  • gtobj (GenotypeMatrix, numpy.ndarray) – An object containing genotype data. Must be a matrix of genotype values.

  • ploidy (int) –

    Ploidy of the species. If ploidy is None:

    • If gtobj is a GenotypeMatrix, then get ploidy from GenotypeMatrix.

    • If gtobj is a numpy.ndarray, then assumed to be 2 (diploid).

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population Bulmer effect statistics. In the event that additive genic variance is zero, NaN’s are produced.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

bulmer_numpy(Z, p, ploidy=2, **kwargs)#

Calculate the Bulmer effect.

Parameters:
  • Z (numpy.ndarray) – A matrix of genotypes.

  • p (numpy.ndarray) – A vector of genotype allele frequencies of shape (p,).

  • ploidy (int) – Ploidy of the species.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population Bulmer effect statistics. In the event that additive genic variance is zero, NaN’s are produced.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

copy()[source]#

Make a shallow copy of the GenomicModel.

Returns:

out – A shallow copy of the original GenomicModel

Return type:

GenomicModel

daavail(gmat, dtype=None, **kwargs)#

Determine whether a deleterious allele is available in the present taxa.

An allele is considered deleterious if its effect is less than zero. Alleles with zero effect are not considered deleterious; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine deleterious allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native boolean type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing whether a deleterious allele is available.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

dacount(gmat, dtype=None, **kwargs)#

Calculate the deleterious allele count across all taxa.

An allele is considered deleterious if its effect is less than zero. Alleles with zero effect are not considered deleterious; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to count deleterious alleles.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing allele counts of the deleterious allele.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

dafixed(gmat, dtype=None, **kwargs)#

Determine whether a deleterious allele is fixed across all taxa.

An allele is considered deleterious if its effect is less than zero. Alleles with zero effect are not considered deleterious; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine deleterious allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing whether a deleterious allele is fixed.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

dafreq(gmat, dtype=None, **kwargs)#

Calculate the deleterious allele frequency across all taxa.

An allele is considered deleterious if its effect is less than zero. Alleles with zero effect are not considered deleterious; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine deleterious allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing allele frequencies of the deleterious allele.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

dapoly(gmat, dtype=None, **kwargs)#

Determine whether a deleterious allele is polymorphic across all taxa.

An allele is considered deleterious if its effect is less than zero. Alleles with zero effect are not considered deleterious; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine deleterious allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing whether a deleterious allele is polymorphic.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

deepcopy(memo=None)[source]#

Make a deep copy of the GenomicModel.

Parameters:

memo (dict) – Dictionary of memo metadata.

Returns:

out – A deep copy of the original GenomicModel

Return type:

GenomicModel

faavail(gmat, dtype=None, **kwargs)#

Determine whether a favorable allele is polymorphic or fixed across all taxa.

An allele is considered favorable if its effect is greater than zero. Alleles with zero effect are not considered favorable; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine favorable allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing whether a favorable allele is available.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

facount(gmat, dtype=None, **kwargs)#

Calculate the favorable allele count across all taxa.

An allele is considered favorable if its effect is greater than zero. Alleles with zero effect are not considered favorable; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to count favorable alleles.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing allele counts of the favorable allele.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

fafixed(gmat, dtype=None, **kwargs)#

Determine whether a favorable allele is fixed across all taxa.

An allele is considered favorable if its effect is greater than zero. Alleles with zero effect are not considered favorable; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine favorable allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing whether a favorable allele is fixed.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

fafreq(gmat, dtype=None, **kwargs)#

Calculate the favorable allele frequency across all taxa.

An allele is considered favorable if its effect is greater than zero. Alleles with zero effect are not considered favorable; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine favorable allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing allele frequencies of the favorable allele.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

fapoly(gmat, dtype=None, **kwargs)#

Determine whether a favorable allele is polymorphic across all taxa.

An allele is considered favorable if its effect is greater than zero. Alleles with zero effect are not considered favorable; they are considered neutral.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine favorable allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing whether a favorable allele is polymorphic.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

classmethod fit(ptobj, cvobj, gtobj, trait=None, method='ML', model_name=None, hyperparams=None, **kwargs)[source]#

Fit a dense, additive linear genomic model.

Parameters:
  • ptobj (BreedingValueMatrix, numpy.ndarray) – An object containing phenotype data. Must be a matrix of breeding values or a phenotype data frame.

  • cvobj (numpy.ndarray) – An object containing covariate data.

  • gtobj (GenotypeMatrix, numpy.ndarray) – An object containing genotype data. Must be a matrix of genotype values.

  • trait (numpy.ndarray, None) – A trait name array of shape (t,).

  • method (str) – Fitting method to use. Options are {"ML"}.

  • model_name (str, None) – Name of the model.

  • hyperparams (dict, None) – Model parameters.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An RR-BLUP model.

Return type:

rrBLUPModel0

classmethod fit_numpy(Y, X, Z, trait=None, method='ML', model_name=None, hyperparams=None, **kwargs)[source]#

Fit a dense, additive linear genomic model.

Parameters:
  • Y (numpy.ndarray) –

    A phenotype matrix of shape (n,t).

    Where:

    • n is the number of observations.

    • t is the number of traits.

  • X (numpy.ndarray) –

    Not used by this model. Assumed to be (n,q) matrix of ones.

    Where:

    • n is the number of observations.

    • q is the number of fixed effects.

  • Z (numpy.ndarray) –

    A genotypes matrix of shape (n,p).

    Where:

    • n is the number of observations.

    • p is the number of markers to be considered as random effects.

  • trait (numpy.ndarray, None) –

    A trait name array of shape (t,).

    Where:

    • t is the number of traits.

  • method (str) – Fitting method to use. Options are {"ML"}.

  • model_name (str, None) – Name of the model.

  • hyperparams (dict, None) – Model parameters.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An RR-BLUP model.

Return type:

rrBLUPModel0

classmethod from_csv_dict(filenames, sep=',', header=0, trait_cols='infer', model_name=None, hyperparams=None, **kwargs)#

Read a DenseAdditiveLinearGenomicModel from a set of CSV files specified by values in a dict.

Parameters:
  • filenames (Dict[str,str]) –

    Dictionary of CSV file names from which to read.

    Must have the following fields:

    - ``"beta"`` is a ``str`` containing fixed effects.
    - ``"u_misc"`` is ``None`` or a ``str`` of CSV file path containing
    

    miscellaneous random effects.

    • "u_a" is None or a str of CSV file path containing additive genetic marker random effects.

  • sep (str, default = ',') – CSV delimiter to use.

  • header (int, list of int, default=0) – Row number(s) to use as the column names, and the start of the data.

  • kwargs (dict) – Additional keyword arguments to use for dictating importing from a CSV.

  • trait_cols (Sequence, str, None, default = "trait") – Names of the trait columns to which to read regression coefficients. If Sequence, column names are given by the strings or integers in the trait_cols Sequence. If str, must be equal to "infer". Use columns in the "beta" input dataframe to load trait breeding values. If None, do not load any trait regression coefficients.

  • model_name (str, None) – Name of the model.

  • hyperparams (dict, None) – Model parameters.

  • kwargs – Additional keyword arguments to use for dictating importing from a CSV.

Returns:

out – A DenseAdditiveLinearGenomicModel read from a set of CSV files.

Return type:

DenseAdditiveLinearGenomicModel

classmethod from_hdf5(filename, groupname=None)#

Read DenseAdditiveLinearGenomicModel from an HDF5 file.

Parameters:
  • filename (str, Path, h5py.File) – If str or Path, an HDF5 file name from which to read. File is closed after reading. If h5py.File, an opened HDF5 file from which to read. File is not closed after reading.

  • groupname (str, None) – If str, an HDF5 group name under which DenseAdditiveLinearGenomicModel data is stored. If None, DenseAdditiveLinearGenomicModel is read from base HDF5 group.

Returns:

gmat – A genotype matrix read from file.

Return type:

DenseAdditiveLinearGenomicModel

classmethod from_pandas_dict(dic, trait_cols='infer', model_name=None, hyperparams=None, **kwargs)#

Read an object from a dict of pandas.DataFrame.

Parameters:
  • dic (dict) –

    Python dictionary containing pandas.DataFrame from which to read. Must have the following fields:

    - ``"beta"`` is a ``pandas.DataFrame`` containing fixed effects.
    - ``"u_misc"`` is ``None`` or ``pandas.DataFrame`` containing
    

    miscellaneous random effects.

    • "u_a" is None or a pandas.DataFrame containing additive genetic marker random effects.

  • trait_cols (Sequence, str, None, default = "trait") – Names of the trait columns to which to read regression coefficients. If Sequence, column names are given by the strings or integers in the trait_cols Sequence. If str, must be equal to "infer". Use columns in the "beta" input dataframe to load trait breeding values. If None, do not load any trait regression coefficients.

  • model_name (str, None) – Name of the model.

  • hyperparams (dict, None) – Model parameters.

  • kwargs (dict) – Additional keyword arguments to use for dictating importing from a dict of pandas.DataFrame.

Returns:

out – A DenseAdditiveLinearGenomicModel read from a dict of pandas.DataFrame.

Return type:

DenseAdditiveLinearGenomicModel

gebv(gtobj, **kwargs)#

Calculate genomic estimated breeding values.

Remark: The difference between ‘predict’ and ‘gebv’ is that ‘predict’ can incorporate other factors (e.g., fixed effects) to provide prediction estimates.

Parameters:
  • gtobj (GenotypeMatrix) – An object containing genotype data. Must be a matrix of genotype values.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – Genomic estimated breeding values matrix.

Return type:

BreedingValueMatrix

gebv_numpy(Z, **kwargs)#

Calculate genomic estimated breeding values.

Remark: The difference between ‘predict_numpy’ and ‘gebv_numpy’ is that ‘predict_numpy’ can incorporate other factors (e.g., fixed effects) to provide prediction estimates.

Parameters:
  • Z (numpy.ndarray) – A matrix of genotype values.

  • kwargs (dict) – Additional keyword arguments.

Returns:

gebv_hat – A matrix of genomic estimated breeding values.

Return type:

numpy.ndarray

gegv(gtobj, **kwargs)#

Calculate genomic estimated genotypic values.

Parameters:
  • gtobj (GenotypeMatrix) – An object containing genotype data. Must be a matrix of genotype values.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A matrix of genomic estimated genotypic values.

Return type:

numpy.ndarray

gegv_numpy(Z, **kwargs)#

Calculate genomic estimated genotypic values.

Parameters:
  • Z (numpy.ndarray) – A matrix of genotypic markers.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A matrix of genomic estimated genotypic values.

Return type:

numpy.ndarray

property hyperparams: dict#

Description for property hyperparams.

lsl(gtobj, ploidy=None, unscale=False, **kwargs)#

Calculate the lower selection limit for a population.

Parameters:
  • gtobj (GenotypeMatrix) – An object containing genotype data. Must be a matrix of genotype values.

  • ploidy (int) – Ploidy of the species.

  • unscale (bool) – If True, then apply the mean of the fixed effects to the output.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population lower selection limit statistics.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

lsl_numpy(p, ploidy, unscale=False, **kwargs)#

Calculate the lower selection limit for a population.

Parameters:
  • p (numpy.ndarray) – A vector of genotype allele frequencies of shape (p,).

  • ploidy (int) – Ploidy of the species.

  • unscale (bool) – If True, then apply the mean of the fixed effects to the output.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population lower selection limit statistics.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

property method: str#

Method to be used to fit the RR-BLUP model.

property model_name: str#

Description for property model_name.

nafixed(gmat, dtype=None, **kwargs)#

Determine whether a neutral allele is fixed across all taxa.

An allele is considered neutral if its effect is equal to zero.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine neutral allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing whether a neutral allele is fixed.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

napoly(gmat, dtype=None, **kwargs)#

Determine whether a neutral allele is polymorphic across all taxa.

An allele is considered neutral if its effect is equal to zero.

Parameters:
  • gmat (GenotypeMatrix) – Genotype matrix for which to determine neutral allele frequencies.

  • dtype (numpy.dtype, None) – Datatype of the returned array. If None, use the native type.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A numpy.ndarray of shape (p,t) containing whether a neutral allele is polymorphic.

Where:

  • p is the number of alleles.

  • t is the number of traits.

Return type:

numpy.ndarray

property nexplan: Integral#

Number of explanatory variables required by the model.

property nexplan_beta: Integral#

Number of fixed effect explanatory variables required by the model.

property nexplan_u: Integral#

Number of random effect explanatory variables required by the model.

property nexplan_u_a: Integral#

Number of additive genomic marker explanatory variables required by the model.

property nexplan_u_misc: Integral#

Number of miscellaneous random effect explanatory variables required by the model.

property nparam: Integral#

Number of model parameters.

property nparam_beta: Integral#

Number of fixed effect parameters.

property nparam_u: Integral#

Number of random effect parameters.

property nparam_u_a: Integral#

Number of additive genomic marker parameters.

property nparam_u_misc: Integral#

Number of miscellaneous random effect parameters.

property ntrait: int#

Number of traits predicted by the model.

predict(cvobj, gtobj, **kwargs)#

Predict breeding values.

Remark: The difference between ‘predict’ and ‘gebv’ is that ‘predict’ can incorporate other factors (e.g., fixed effects) to provide prediction estimates.

Parameters:
  • cvobj (numpy.ndarray) – An object containing covariate data.

  • gtobj (GenotypeMatrix, numpy.ndarray) – An object containing genotype data. Must be a matrix of genotype values.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – Estimated breeding values matrix.

Return type:

BreedingValueMatrix

predict_numpy(X, Z, **kwargs)#

Predict breeding values.

Remark: The difference between predict_numpy and gebv_numpy is that predict_numpy can incorporate other factors (e.g., fixed effects) to provide prediction estimates.

Parameters:
  • X (numpy.ndarray) –

    A matrix of covariates of shape (n,q).

    Where:

    • n is the number of taxa (observations).

    • q is the number of fixed effects.

  • Z (numpy.ndarray) –

    A matrix of random predictors and/or genotype values of shape (n,p).

    Where:

    • n is the number of taxa (observations).

    • p is the number of random predictors.

  • kwargs (dict) – Additional keyword arguments.

Returns:

Y_hat – A matrix of estimated breeding values of shape (n,t).

Where:

  • n is the number of taxa (observations).

  • t is the number of traits.

Return type:

numpy.ndarray

score(ptobj, cvobj, gtobj, **kwargs)#

Return the coefficient of determination R**2 of the prediction.

Parameters:
  • ptobj (BreedingValueMatrix, pandas.DataFrame, numpy.ndarray) – An object containing phenotype data. Must be a matrix of breeding values or a phenotype data frame.

  • cvobj (numpy.ndarray) – An object containing covariate data.

  • gtobj (GenotypeMatrix, numpy.ndarray) – An object containing genotype data. Must be a matrix of genotype values.

  • kwargs (dict) – Additional keyword arguments.

Returns:

Rsq – A coefficient of determination array of shape (t,).

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

score_numpy(Y, X, Z, **kwargs)#

Return the coefficient of determination R**2 of the prediction.

Parameters:
  • Y (numpy.ndarray) –

    A matrix of phenotypes of shape (n,t).

    Where:

    • n is the number of taxa (observations).

    • t is the number of traits.

  • X (numpy.ndarray) –

    A matrix of covariates of shape (n,q).

    Where:

    • n is the number of taxa (observations).

    • q is the number of fixed effects.

  • Z (numpy.ndarray) –

    A matrix of random predictors and/or genotype values of shape (n,p).

    Where:

    • n is the number of taxa (observations).

    • p is the number of random predictors.

  • kwargs (dict) – Additional keyword arguments.

Returns:

Rsq – A coefficient of determination array of shape (t,).

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

to_csv_dict(filenames, trait_cols='trait', sep=',', header=True, index=False, **kwargs)#

Export a DenseAdditiveLinearGenomicModel to a set of CSV files specified by values in a dict.

Parameters:
  • filenames (dict of str) – CSV file names to which to write. Must have the keys: "beta", "u_misc", and "u_a" (case sensitive).

  • trait_cols (Sequence, str, None, default = "trait") – Names of the trait columns to which to write regression coefficients. If Sequence, column names are given by the strings in the trait_cols Sequence. If str, must be equal to "trait". Use trait names given in the trait property. If None, use numeric trait column names.

  • sep (str, default = ",") – Separator to use in the exported CSV files.

  • header (bool, default = True) – Whether to save header names.

  • index (bool, default = False) – Whether to save a row index in the exported CSV files.

  • kwargs (dict) – Additional keyword arguments to use for dictating export to a CSV.

Return type:

None

to_hdf5(filename, groupname=None, overwrite=True)#

Write DenseAdditiveLinearGenomicModel to an HDF5 file.

Parameters:
  • filename (str, Path, h5py.File) – If str, an HDF5 file name to which to write. File is closed after writing. If h5py.File, an opened HDF5 file to which to write. File is not closed after writing.

  • groupname (str, None) – If str, an HDF5 group name under which DenseAdditiveLinearGenomicModel data is stored. If None, DenseAdditiveLinearGenomicModel is written to the base HDF5 group.

  • overwrite (bool) – Whether to overwrite data fields if they are present in the HDF5 file.

Return type:

None

to_pandas_dict(trait_cols='trait', **kwargs)#

Export a DenseAdditiveLinearGenomicModel to a dict of pandas.DataFrame.

Parameters:
  • trait_cols (Sequence, str, None, default = "trait") – Names of the trait columns to which to write regression coefficients. If Sequence, column names are given by the strings in the trait_cols Sequence. If str, must be equal to "trait". Use trait names given in the trait property. If None, use numeric trait column names.

  • kwargs (dict) – Additional keyword arguments to use for dictating export to a dict of pandas.DataFrame.

Returns:

out – An output dataframe.

Return type:

dict

property trait: ndarray#

Description for property trait.

property u: ndarray#

Random effect regression coefficients.

property u_a: ndarray#

Additive genomic marker effects.

property u_misc: ndarray#

Miscellaneous random effect regression coefficients.

usl(gtobj, ploidy=None, unscale=False, **kwargs)#

Calculate the upper selection limit for a population.

Parameters:
  • gtobj (GenotypeMatrix, numpy.ndarray) – An object containing genotype data. Must be a matrix of genotype values.

  • ploidy (int) – Ploidy of the species.

  • unscale (bool) – If True, then apply the mean of the fixed effects to the output.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population upper selection limit statistics.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

usl_numpy(p, ploidy, unscale=False, **kwargs)#

Calculate the upper selection limit for a population.

Parameters:
  • p (numpy.ndarray) – A vector of genotype allele frequencies of shape (p,).

  • ploidy (int) – Ploidy of the species.

  • unscale (bool) – If True, then apply the mean of the fixed effects to the output.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population upper selection limit statistics.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

var_A(gtobj, **kwargs)#

Calculate the population additive genetic variance

Parameters:
  • gtobj (GenotypeMatrix) – An object containing genotype data. Must be a matrix of genotype values.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population additive genetic variances.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

var_A_numpy(Z, **kwargs)#

Calculate the population additive genetic variance

Parameters:
  • Z (numpy.ndarray) – A matrix of genotypes.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population additive genetic variances.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

var_G(gtobj, **kwargs)#

Calculate the population genetic variance.

Parameters:
  • gtobj (GenotypeMatrix, numpy.ndarray) – An object containing genotype data. Must be a matrix of genotype values.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population genetic variances.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

var_G_numpy(Z, **kwargs)#

Calculate the population genetic variance.

Parameters:
  • Z (numpy.ndarray) – A matrix of genotypes.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population genetic variances.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

var_a(gtobj, ploidy=None, **kwargs)#

Calculate the population additive genic variance

Parameters:
  • gtobj (GenotypeMatrix, numpy.ndarray) – An object containing genotype data. Must be a matrix of genotype values.

  • ploidy (Integral, None) –

    Ploidy of the species.

    If ploidy is None:

    • If gtobj is a GenotypeMatrix, then get ploidy from GenotypeMatrix.

    • If gtobj is a numpy.ndarray, then assumed to be 2 (diploid).

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population additive genic variances.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray

var_a_numpy(p, ploidy=2, **kwargs)#

Calculate the population additive genic variance

Parameters:
  • p (numpy.ndarray) – A vector of genotype allele frequencies of shape (p,).

  • ploidy (Integral) – Ploidy of the species.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An array of shape (t,) containing population additive genic variances.

Where:

  • t is the number of traits.

Return type:

numpy.ndarray