Coancestry Matrices#
Class Family Overview#
The CoancestryMatrix
family of classes is used to represent coancestry relationships between individuals. This includes additive relationship matrices and genomic relationship matrices. Coancestry matrices can be used in the estimation of genomic prediction models and to make selection decisions. CoancestryMatrix
objects store additional taxa metadata which serve as labels for rows and columns in the matrix.
Summary of Coancestry Matrix Classes#
Coancestry matrix classes can be found in the pybrops.popgen.cmat
module in PyBrOpS. Within this maodule are several interfaces and implemented classes, which are summarized in the table below.
Class Name |
Class Type |
Class Description |
---|---|---|
|
Abstract |
Interface for all coancestry matrix child classes. |
|
Semi-Abstract |
Semi-implemented class for deriving new dense coancestry matrix child classes. |
|
Concrete |
Class representing dense molecular coancestry matrices. |
|
Concrete |
Class representing a genomic relationship matrix defined by VanRaden (2008). |
|
Concrete |
Class representing a genomic relationship matrix defined by Yang. |
Coancestry Matrix Properties#
Coancestry matrices have numerous properties which can be grouped into three main groupings: general properties, taxa properties, and square properties. These properties are summarized in the tables below.
General properties#
Coancestry matrices share several shape properties common to the Matrix
family of classes. These common properties are summarized below.
Property |
Description |
---|---|
|
The raw coancestry matrix pointer |
|
The number of dimensions for the coancestry matrix |
|
The coancestry matrix shape |
Taxa properties#
Coancestry matrices have several taxa related properties including taxa names, taxa group identities, and sorting metadata, which can be used for quick group access and sorting. These taxa related properties are summarized below.
Property |
Description |
---|---|
|
The number of taxa represented by the coancestry matrix |
|
The names of the taxa |
|
The matrix axis along which taxa are stored |
|
An optional taxa group label |
|
If taxa are sorted by group: get the names of the groups |
|
If taxa are sorted by group: get the start indices (inclusive) for each group |
|
If taxa are sorted by group: get the stop indices (exclusive) for each group |
|
If taxa are sorted by group: get the length of each group |
Square matrix properties#
Since coancestry matrices are square by nature, they also have several properties which extract data regarding their squareness. These properties are summarized below.
Property |
Description |
---|---|
|
The number of square axes for the coancestry matrix |
|
The axes indices for the square axes for the coancestry matrix |
|
The lengths of the square axes for the coancestry matrix |
Loading Coancestry Matrix Modules#
Importing coancestry matrix classes can be accomplished using the following import statements:
# import the CoancestryMatrix class (an abstract interface class)
from pybrops.popgen.cmat.CoancestryMatrix import CoancestryMatrix
# import the DenseCoancestryMatrix class (a semi-abstract class)
from pybrops.popgen.cmat.DenseCoancestryMatrix import DenseCoancestryMatrix
# import the DenseMolecularCoancestryMatrix class (a concrete implemented class)
from pybrops.popgen.cmat.DenseMolecularCoancestryMatrix import DenseMolecularCoancestryMatrix
# import the DenseVanRadenCoancestryMatrix class (a concrete implemented class)
from pybrops.popgen.cmat.DenseVanRadenCoancestryMatrix import DenseVanRadenCoancestryMatrix
# import the DenseYangCoancestryMatrix class (a concrete implemented class)
from pybrops.popgen.cmat.DenseYangCoancestryMatrix import DenseYangCoancestryMatrix
Creating Coancestry Matrices#
Coancestry matrices can be created using several methods including from raw NumPy arrays, from GenotypeMatrix
objects, from Pandas DataFrames, from CSV files, and from HDF5 fiels. The following subsections detail the creation or loading of coancestry matrices from their corresponding sources.
Creating coancestry matrices from NumPy arrays#
Using the constructor of a CoancestryMatrix
class, one can create coancestry matrices from NumPy arrays. The example below demonstrates the creation of a DenseMolecularCoancestryMatrix
object from raw NumPy arrays.
# shape parameters
ntaxa = 100
ngroup = 20
# create random coancestries
mat = numpy.random.uniform(0.0, 1.0, size = (ntaxa,ntaxa))
# create taxa names
taxa = numpy.array(
["taxon"+str(i+1).zfill(3) for i in range(ntaxa)],
dtype = object
)
# create taxa groups
taxa_grp = numpy.random.randint(1, ngroup+1, ntaxa)
taxa_grp.sort()
# create a coancestry matrix from NumPy arrays
cmat = DenseMolecularCoancestryMatrix(
mat = mat,
taxa = taxa,
taxa_grp = taxa_grp
)
Creating coancestry matrices from GenotypeMatrix objects#
Coancestry matrices may also be constructed from GenotypeMatrix
objects. This can be accomplished using the from_gmat
class method. The code below demonstrates how to use this method to accomplish this task.
# shape parameters for random genotypes
ntaxa = 100
nvrnt = 1000
ngroup = 20
nchrom = 10
ploidy = 2
# create random genotypes
mat = numpy.random.randint(0, ploidy+1, size = (ntaxa,nvrnt)).astype("int8")
# create taxa names
taxa = numpy.array(
["taxon"+str(i+1).zfill(3) for i in range(ntaxa)],
dtype = object
)
# create taxa groups
taxa_grp = numpy.random.randint(1, ngroup+1, ntaxa)
taxa_grp.sort()
# create marker variant chromsome assignments
vrnt_chrgrp = numpy.random.randint(1, nchrom+1, nvrnt)
vrnt_chrgrp.sort()
# create marker physical positions
vrnt_phypos = numpy.random.choice(1000000, size = nvrnt, replace = False)
vrnt_phypos.sort()
# create marker variant names
vrnt_name = numpy.array(
["SNP"+str(i+1).zfill(4) for i in range(nvrnt)],
dtype = object
)
# create a genotype matrix from scratch using NumPy arrays
gmat = DenseGenotypeMatrix(
mat = mat,
taxa = taxa,
taxa_grp = taxa_grp,
vrnt_chrgrp = vrnt_chrgrp,
vrnt_phypos = vrnt_phypos,
vrnt_name = vrnt_name,
vrnt_genpos = None,
vrnt_xoprob = None,
vrnt_hapgrp = None,
vrnt_hapalt = None,
vrnt_hapref = None,
vrnt_mask = None,
ploidy = ploidy
)
# group taxa and variants
gmat.group_taxa()
gmat.group_vrnt()
# construct Coancestry Matrix from a Genotype Matrix
cmat = DenseMolecularCoancestryMatrix.from_gmat(gmat = gmat)
Creating coancestry matrices from Pandas DataFrames#
Coancestry matrices may be read from Pandas DataFrames. The from_pandas
class method may be used to read a CoancestryMatrix
from a pandas DataFrame. The code example below demonstrates this method’s usage.
# load from pandas.DataFrame
tmp = DenseMolecularCoancestryMatrix.from_pandas(
df = df,
taxa_col = "taxa", # column from which to load taxa
taxa_grp_col = "taxa_grp", # column from which to load taxa groups
taxa = "all", # load all taxa
)
Loading coancestry matrices from CSV files#
Coancestry matrices may also be read from CSV files in a manner similar to Pandas DataFrames. The from_csv
class method may be used to load coancestry matrices from csv files. The following code block demonstrates the usage of this method.
# load from pandas.DataFrame
tmp = DenseMolecularCoancestryMatrix.from_csv(
filename = "saved_coancestry_matrix.csv",
taxa_col = "taxa", # column from which to load taxa
taxa_grp_col = "taxa_grp", # column from which to load taxa groups
taxa = "all", # load all taxa
)
Loading coancestry matrices from HDF5 files#
As with all classes in the Matrix
family, CoancestryMatrix
objects may be imported and exported to an HDF5 format. To read saved coancestry matrices from an HDF5 file, use the from_hdf5
class method. The code below demonstrates the use of this method.
# read from file
cmat = DenseMolecularCoancestryMatrix.from_hdf5("sample_coancestry_matrix.h5")
Copying Coancestry Matrices#
Coancestry matrices may be copied using two methods: shallow copying and deep copying.
Shallow copying#
In shallow copying, references to a CoancestryMatrix
’s data are copied to a new coancestry matrix object. Copying is only one level deep which means that changes to the original object may affect data values in the copied object. The code below illustrates the use of the copy
method bound to CoancestryMatrix
objects and the base Python function copy.copy
which can both be used to shallow copy a coancestry matrix object.
# copy a coancestry matrix
tmp = copy.copy(cmat)
tmp = cmat.copy()
Deep copying#
In deep copying, data in a CoancestryMatrix
is recursively copied to a new coancestry matrix object. Copying occurs down to the deepest levels so that changes to the original object will not affect data values in the copied object. The code below illustrates the use of the deepcopy
method bound to CoancestryMatrix
objects and the base Python function copy.deepcopy
which can both be used to deep copy a coancestry matrix object.
# deep copy a coancestry matrix
tmp = copy.deepcopy(cmat)
tmp = cmat.deepcopy()
Copy-On Element Manipulation#
Coancestry matrices have several methods by which modifed copies of the original matrix can be made. These are called copy-on element manipulation methods. Matrices may have rows and/or columns adjoined, deleted, inserted, or selected. The following sections demonstrate the use of these method families.
Adjoin elements#
The adjoin
family of methods allows for taxa rows and columns of a coancestry matrix to be adjoined together, creating a new matrix in the process. Use of the adjoin
method family is demonstrated in the code below.
# create a new coancestry matrix to demonstrate
new = cmat.deepcopy()
# adjoin coancestry matrices along the taxa axis
tmp = cmat.adjoin(new, axis = cmat.taxa_axis)
tmp = cmat.adjoin_taxa(new)
Delete elements#
The delete
family of methods allows for taxa rows and columns of a coancestry matrix to be removed in a copy of the original. Use of the delete
method family is demonstrated in the code below.
# delete first taxon using an integer
tmp = cmat.delete(0, axis = cmat.taxa_axis)
tmp = cmat.delete_taxa(0)
# delete first five taxa using a slice
tmp = cmat.delete(slice(0,5), axis = cmat.taxa_axis)
tmp = cmat.delete_taxa(slice(0,5))
# delete first five taxa using a Sequence
tmp = cmat.delete([0,1,2,3,4], axis = cmat.taxa_axis)
tmp = cmat.delete_taxa([0,1,2,3,4])
Insert elements#
The insert
family of methods allows for taxa rows and columns of a coancestry matrix to be inserted into a copy of the original matrix. Use of the insert
method family is demonstrated in the code below.
Select elements#
The select
family of methods allows for taxa rows and columns of the coancestry matrix to be selected and extracted to a copy of the original matrix. Use of the select
method family is demonstrated in the code below.
# select first five taxa using a Sequence
tmp = cmat.select([0,1,2,3,4], axis = cmat.taxa_axis)
tmp = cmat.select_taxa([0,1,2,3,4])
In-Place Element Manipulation#
Coancestry matrices have several methods which execute in-place element manipulations. These are called in-place element manipulation methods. Coancestry matrices may have taxa rows and/or columns appended, removed, incorporated, or concatenated. The following sections demonstrate the use of these method families.
Append elements#
The append
family of methods allows for new taxa rows and columns to be appended to the coancestry matrix. The code segment below demonstrates their use.
# append coancestry matrices along the taxa axis
tmp = cmat.deepcopy() # copy original
tmp.append(cmat, axis = tmp.taxa_axis) # append original to copy
tmp = cmat.deepcopy() # copy original
tmp.append_taxa(cmat) # append original to copy
Remove elements#
The remove
family of methods allows for taxa rows and columns to be removed from a coancestry matrix. A demonstration of their use can be seen below.
# remove first taxon using an integer
tmp = cmat.deepcopy() # copy original
tmp.remove(0, axis = cmat.taxa_axis) # remove from copy
tmp = cmat.deepcopy() # copy original
tmp.remove_taxa(0) # remove from copy
# remove first five taxa using a slice
tmp = cmat.deepcopy() # copy original
tmp.remove(slice(0,5), axis = cmat.taxa_axis) # remove from copy
tmp = cmat.deepcopy() # copy original
tmp.remove_taxa(slice(0,5)) # remove from copy
# remove first five taxa using a Sequence
tmp = cmat.deepcopy() # copy original
tmp.remove([0,1,2,3,4], axis = cmat.taxa_axis) # remove from copy
tmp = cmat.deepcopy() # copy original
tmp.remove_taxa([0,1,2,3,4]) # remove from copy
Incorporate elements#
The incorp
family of methods allows for new taxa rows and columns to be inserted at specific locations a coancestry matrix. Use of the incorp
family is demonstrated in the code segment below below.
# incorp coancestry matrix along the taxa axis before index 0
tmp = cmat.deepcopy() # copy original
tmp.incorp(0, cmat, axis = cmat.taxa_axis) # incorporate into copy
tmp = cmat.deepcopy() # copy original
tmp.incorp_taxa(0, cmat) # incorporate into copy
Concatenate elements#
The concat
family of methods allows for multiple coancestry matrices to be concatenated to each other. The code segment below demonstrates their use.
Grouping and Sorting#
Coancestry matrices in PyBrOpS have several sorting and grouping focused methods. Sorting methods can be used to reorder, sort, and group taxa alphanumerically. The following sections demonstrate the use of the reorder
, lexsort
, sort
, and group
method families.
Reordering elements#
Taxa in a coancestry matrix can be reordered using the reorder
family of methods. Demonstrations of this method family are below.
# create reordering indices
indices = numpy.arange(cmat.ntaxa)
numpy.random.shuffle(indices)
tmp = cmat.deepcopy()
# reorder values along the taxa axis
tmp.reorder(indices, axis = tmp.taxa_axis)
tmp.reorder_taxa(indices)
Lexsorting elements#
An indirect stable sort - or lexsort - for taxa axes can be performed using the lexsort
family of methods. The code segment below illustrates the use of this family of methods.
# create lexsort keys for taxa
key1 = numpy.random.randint(0, 10, cmat.ntaxa)
key2 = numpy.arange(cmat.ntaxa)
numpy.random.shuffle(key2)
# lexsort along the taxa axis
cmat.lexsort((key2,key1), axis = cmat.taxa_axis)
cmat.lexsort_taxa((key2,key1))
Sorting elements#
Alphanumeric sorting along taxa axes can be done using the sort
family of methods. Sorting examples are illustrated below.
# make copy
tmp = cmat.deepcopy()
# sort along taxa axis
tmp.sort(axis = tmp.taxa_axis)
tmp.sort_taxa()
Grouping elements#
Grouping along taxa axes can be done using the group
family of methods. The following code illustrates the use of the group
method family along the taxa axes of a coancestry matrix.
# make copy
tmp = cmat.deepcopy()
# sort along taxa axis
tmp.group(axis = tmp.taxa_axis)
tmp.group_taxa()
# determine whether grouping has occurred along the taxa axis
out = tmp.is_grouped(axis = tmp.taxa_axis)
out = tmp.is_grouped_taxa()
Coancestry and Kinship Methods#
Retrieving coancestry values#
Coancestry values may be retrieved by using the coancestry
method. Retrieval of coancestry values may also be done via indexing, but this method is not guaranteed to be in the correct format (coancestry or kinship). The return format via indexing is implementation dependent. The code below demonstrates the use of this method.
# Get the coancestry at a specific matrix coordinate
out = cmat.coancestry(0,0)
out = cmat[0,0] # NOT guaranteed to be in correct format
Retrieving kinship values#
Kinship values may be retrieved by using the kinship
method. Like coancestry values, kinship values may also be retrieved via indexing, but this method is not guaranteed to be in the correct format since the internal matrix representation is implementation dependent. The code below demonstrates the use of this method.
# Get the kinship at a specific matrix coordinate
out = cmat.kinship(0,0)
out = 0.5 * cmat[0,0] # NOT guaranteed to be in correct format
Retrieving the coancestry matrix as a specific format#
Coancestry matrices may be extracted as bare-bones NumPy arrays in kinship or coancestry formats using the mat_asformat
method. The code below demonstrates the usage of this method.
# Get the coancestry matrix as a specific format
out = cmat.mat_asformat(format = "kinship")
out = cmat.mat_asformat(format = "coancestry")
Determining if the coancestry matrix is positive semidefinite#
For some optimization, it may be necessary for a coancestry matrix to be positive semidefinite. The is_positive_semidefinite
method may be used to determine if a coancestry matrix is positive semidefinite or not. An example of this method’s usage is below.
# Determine if the coancestry matrix is positive semidefinite (convex)
out = cmat.is_positive_semidefinite()
Applying jitter values along the diagonal#
In the event that a coancestry matrix is not positive semidefinite, it may be helpful to apply a small jitter along the diagonal of the matrix. A jitter can be applied using the apply_jitter
method as is demonstrated below.
# Apply a jitter along the diagonal to try to make the matrix positive semidefinite
out = cmat.apply_jitter()
Calculating the matrix inverse#
The inverse of a coancestry matrix may be calculated using the inverse
method. Varying format arguments may be used to specific if the inverse is for the kinship representation or the coancestry representation. The code below demonstrates this method’s usage.
# Calculate the inverse of the coancestry matrix
out = cmat.inverse()
out = cmat.inverse(format = "kinship")
out = cmat.inverse(format = "coancestry")
Calculating maximum attainable inbreeding#
For particular tasks, it may be useful to calculate the maximum attainable level of inbreeding after one generation. This is equivalent to the maximum value along the diagonal of the coancestry matrix. This may be done using the max_inbreeding
method, demonstrated below.
# Calculate the maximum attainable inbreeding after 1 generation
out = cmat.max_inbreeding()
out = cmat.max_inbreeding(format = "kinship")
out = cmat.max_inbreeding(format = "coancestry")
Calculating minimum attainable inbreeding#
For other tasks, it may be useful to calculate the minimum attainable level of inbreeding after one generation. This may be done using the min_inbreeding
method, demonstrated below.
# Calculate the minimum attainable inbreeding after 1 generation
out = cmat.min_inbreeding()
out = cmat.min_inbreeding(format = "kinship")
out = cmat.min_inbreeding(format = "coancestry")
Summary Statistics#
Maximum coancestry#
The maximum coancestry value across the entire coancestry matrix may be calculated using the max
method. Below is a demonstration of this method.
# get the max for the whole coancestry matrix
out = cmat.max()
Mean coancestry#
The mean coancestry across the entire coancestry matrix may be calculated using the mean
method. Below is a demonstration of this method.
# get the mean for the whole coancestry matrix
out = cmat.mean()
Minimum coancestry#
The minimum coancestry value across the entire coancestry matrix may be calculated using the min
method. Below is a demonstration of this method.
# get the min for the whole coancestry matrix
out = cmat.min()
Exporting Coancestry Matrices#
Coancestry matrices may be exported to multiple formats including Pandas DataFrames, CSV files, and HDF5 files. The following subsections demonstrate how to export coancestry matrices.
Exporting to Pandas DataFrame#
The to_pandas
method can be used to export a coancestry matrix to a Pandas DataFrame. Column names may be optionally provided to override default column names.
# export to a pandas.DataFrame
# use default column names to export
df = cmat.to_pandas()
Exporting to CSV#
The to_csv
method can be used to export a coancestry matrix to a CSV file. Column names may be optionally provided to override default column names.
# export to a CSV
# use default column names to export
cmat.to_csv("saved_coancestry_matrix.csv")
Exporting to HDF5#
To write coancestry matrices to an HDF5 file, use the to_hdf5
method. The code below demonstrates the use of this method.
# write a coancestry matrix to an HDF5 file
cmat.to_hdf5("saved_coancestry_matrix.h5")