DensePhasedGenotypeMatrix#

class pybrops.popgen.gmat.DensePhasedGenotypeMatrix.DensePhasedGenotypeMatrix(mat, taxa=None, taxa_grp=None, vrnt_chrgrp=None, vrnt_phypos=None, vrnt_name=None, vrnt_genpos=None, vrnt_xoprob=None, vrnt_hapgrp=None, vrnt_hapalt=None, vrnt_hapref=None, vrnt_mask=None, **kwargs)[source]#

Bases: DenseGenotypeMatrix, DensePhasedTaxaVariantMatrix, PhasedGenotypeMatrix

A concrete class for phased genoypte matrix objects.

The purpose of this concrete class is to implement functionality for:
  1. Genotype matrix ploidy and phase metadata.

  2. Genotype matrix format conversion.

  3. Genotype matrix allele counting routines.

  4. Genotype matrix genotype counting routines.

  5. Loading phased genotype matrices from VCF and HDF5.

Constructor for the class DensePhasedGenotypeMatrix.

Parameters:
  • mat (numpy.ndarray) – An int8 haplotype matrix. Must be {0,1,2} format.

  • taxa (numpy.ndarray, None) – A numpy.ndarray of shape (n,) containing taxa names. If None, do not store any taxa name information.

  • taxa_grp (numpy.ndarray, None) – A numpy.ndarray of shape (n,) containing taxa groupings. If None, do not store any taxa group information.

  • vrnt_chrgrp (numpy.ndarray, None) – A numpy.ndarray of shape (p,) containing variant chromosome group labels. If None, do not store any variant chromosome group label information.

  • vrnt_phypos (numpy.ndarray, None) – A numpy.ndarray of shape (p,) containing variant chromosome physical positions. If None, do not store any variant chromosome physical position information.

  • vrnt_name (numpy.ndarray, None) – A numpy.ndarray of shape (p,) containing variant names. If None, do not store any variant names.

  • vrnt_genpos (numpy.ndarray, None) – A numpy.ndarray of shape (p,) containing variant chromosome genetic positions. If None, do not store any variant chromosome genetic position information.

  • vrnt_xoprob (numpy.ndarray, None) – A numpy.ndarray of shape (p,) containing variant crossover probabilities. If None, do not store any variant crossover probabilities.

  • vrnt_hapgrp (numpy.ndarray, None) – A numpy.ndarray of shape (p,) containing variant haplotype group labels. If None, do not store any variant haplotype group label information.

  • vrnt_hapalt (numpy.ndarray, None) – A numpy.ndarray of shape (p,) containing variant alternative alleles. If None, do not store any variant alternative allele information.

  • vrnt_hapref (numpy.ndarray, None) – A numpy.ndarray of shape (p,) containing variant reference alleles. If None, do not store any variant reference allele information.

  • vrnt_mask (numpy.ndarray, None) – A numpy.ndarray of shape (p,) containing a variant mask. If None, do not store any variant mask information.

Methods

acount

Allele count of the non-zero allele across all taxa.

adjoin

Add additional elements to the end of the Matrix along an axis.

adjoin_phase

Adjoin values along the phase axis.

adjoin_taxa

Add additional elements to the end of the DensePhasedGenotypeMatrix along the taxa axis.

adjoin_vrnt

Add additional elements to the end of the DensePhasedGenotypeMatrix along the variant axis.

afixed

Determine allele fixation for loci across all taxa.

afreq

Allele frequency of the non-zero allele across all taxa.

apoly

Allele polymorphism presence or absense across all loci.

append

Append values to the matrix.

append_phase

Append values to the Matrix along the phase axis.

append_taxa

Append values to the Matrix along the taxa axis.

append_vrnt

Append values to the Matrix along the variant axis.

concat

Concatenate matrices together along an axis.

concat_phase

Concatenate list of Matrix together along the taxa axis.

concat_taxa

Concatenate list of DensePhasedGenotypeMatrix together along the taxa axis.

concat_vrnt

Concatenate list of DensePhasedGenotypeMatrix together along the variant axis.

copy

Make a shallow copy of the DensePhasedGenotypeMatrix.

deepcopy

Make a deep copy of the DensePhasedGenotypeMatrix.

delete

Delete sub-arrays along an axis.

delete_phase

Delete sub-arrays along the phase axis.

delete_taxa

Delete sub-arrays along the taxa axis.

delete_vrnt

Delete sub-arrays along the variant axis.

from_hdf5

Read a DensePhasedGenotypeMatrix from an HDF5 file.

from_vcf

Read a DensePhasedGenotypeMatrix from a VCF file.

group

Sort the DensePhasedTaxaVariantMatrix along an axis, then populate grouping indices.

group_taxa

Sort the Matrix along the taxa axis, then populate grouping indices for the taxa axis.

group_vrnt

Sort the Matrix along the variant axis, then populate grouping indices for the variant axis.

gtcount

Gather genotype counts for homozygous major, heterozygous, homozygous minor for all individuals.

gtfreq

Gather genotype frequencies for homozygous major, heterozygous, homozygous minor across all individuals.

incorp

Incorporate values along the given axis before the given indices.

incorp_phase

Incorporate values along the taxa axis before the given indices.

incorp_taxa

Incorporate values along the taxa axis before the given indices.

incorp_vrnt

Incorporate values along the variant axis before the given indices.

insert

Insert values along the given axis before the given indices.

insert_phase

Insert values along the phase axis before the given indices.

insert_taxa

Insert values along the taxa axis before the given indices.

insert_vrnt

Insert values along the variant axis before the given indices.

interp_genpos

Interpolate genetic map postions for variants using a GeneticMap

interp_xoprob

Interpolate genetic map positions AND crossover probabilities between sequential markers using a GeneticMap and a GeneticMapFunction.

is_grouped

Determine whether the Matrix has been sorted and grouped.

is_grouped_taxa

Determine whether the Matrix has been sorted and grouped along the taxa axis.

is_grouped_vrnt

Determine whether the Matrix has been sorted and grouped along the variant axis.

lexsort

Perform an indirect stable sort using a tuple of keys.

lexsort_taxa

Perform an indirect stable sort using a sequence of keys along the taxa axis.

lexsort_vrnt

Perform an indirect stable sort using a sequence of keys along the variant axis.

maf

Minor allele frequency across all taxa.

mat_asformat

Get mat in a specific format type.

meh

Mean expected heterozygosity across all taxa.

remove

Remove sub-arrays along an axis.

remove_phase

Remove sub-arrays along the phase axis.

remove_taxa

Remove sub-arrays along the taxa axis.

remove_vrnt

Remove sub-arrays along the variant axis.

reorder

Reorder the VariantMatrix.

reorder_taxa

Reorder elements of the Matrix along the taxa axis using an array of indices.

reorder_vrnt

Reorder elements of the Matrix along the variant axis using an array of indices.

select

Select certain values from the matrix.

select_phase

Select certain values from the Matrix along the phase axis.

select_taxa

Select certain values from the DensePhasedGenotypeMatrix along the taxa axis.

select_vrnt

Select certain values from the DensePhasedGenotypeMatrix along the variant axis.

sort

Reset metadata for corresponding axis: name, stix, spix, len.

sort_taxa

Sort slements of the Matrix along the taxa axis using a sequence of keys.

sort_vrnt

Sort slements of the Matrix along the variant axis using a sequence of keys.

tacount

Allele count of the non-zero allele within each taxon.

tafreq

Allele frequency of the non-zero allele within each taxon.

to_hdf5

Write GenotypeMatrix to an HDF5 file.

ungroup

Ungroup the DensePhasedTaxaVariantMatrix along an axis by removing grouping metadata.

ungroup_taxa

Ungroup the DenseTaxaMatrix along the taxa axis by removing taxa group metadata.

ungroup_vrnt

Ungroup the DenseVariantMatrix along the variant axis by removing variant group metadata.

Attributes

mat

Pointer to raw numpy.ndarray object.

mat_format

Get matrix representation format

mat_ndim

Number of dimensions of the raw numpy.ndarray.

mat_shape

Shape of the raw numpy.ndarray.

nphase

Get number of phases

ntaxa

Number of taxa

nvrnt

Number of variants.

phase_axis

Get phase axis number

ploidy

Get matrix ploidy number

taxa

Taxa label array

taxa_axis

Get taxa axis number

taxa_grp

Taxa group label.

taxa_grp_len

Taxa group length.

taxa_grp_name

Taxa group name.

taxa_grp_spix

Taxa group stop index.

taxa_grp_stix

Taxa group start index.

vrnt_axis

Get variant axis

vrnt_chrgrp

Variant chromosome group label.

vrnt_chrgrp_len

Variant chromosome group length.

vrnt_chrgrp_name

Variant chromosome group names.

vrnt_chrgrp_spix

Variant chromosome group stop indices.

vrnt_chrgrp_stix

Variant chromosome group start indices.

vrnt_genpos

Variant genetic position.

vrnt_hapalt

Variant haplotype sequence.

vrnt_hapgrp

Variant haplotype group label.

vrnt_hapref

Variant reference haplotype sequence.

vrnt_mask

Variant mask.

vrnt_name

Variant name.

vrnt_phypos

Variant physical position.

vrnt_xoprob

Variant crossover sequential probability.

__add__(value)#

Elementwise add matrices

Parameters:

value (object) – Object which to add.

Returns:

out – An object resulting from the addition.

Return type:

object

__mul__(value)#

Elementwise multiply matrices

Parameters:

value (object) – Object which to multiply.

Returns:

out – An object resulting from the multiplication.

Return type:

object

acount(dtype=None)[source]#

Allele count of the non-zero allele across all taxa.

Parameters:

dtype (dtype, None) – The data type of the returned array. If None, use the native type.

Returns:

out – A numpy.ndarray of shape (p,) containing allele counts of the allele coded as 1 for all p loci.

Return type:

numpy.ndarray

adjoin(values, axis=-1, taxa=None, taxa_grp=None, vrnt_chrgrp=None, vrnt_phypos=None, vrnt_name=None, vrnt_genpos=None, vrnt_xoprob=None, vrnt_hapgrp=None, vrnt_hapalt=None, vrnt_hapref=None, vrnt_mask=None, **kwargs)#

Add additional elements to the end of the Matrix along an axis.

Parameters:
  • values (Matrix, numpy.ndarray) – Values are appended to append to the Matrix.

  • axis (int) – The axis along which values are adjoined.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A copy of DensePhasedTaxaVariantMatrix with values appended to axis. Note that adjoin does not occur in-place: a new DensePhasedTaxaVariantMatrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

adjoin_phase(values, **kwargs)#

Adjoin values along the phase axis.

Parameters:
  • values (Matrix or numpy.ndarray) – Values to adjoin along the phase axis.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A copy of DensePhasedTaxaVariantMatrix with values appended to axis. Note that adjoin does not occur in-place: a new DensePhasedTaxaVariantMatrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

adjoin_taxa(values, taxa=None, taxa_grp=None, **kwargs)[source]#

Add additional elements to the end of the DensePhasedGenotypeMatrix along the taxa axis.

Parameters:
  • values (Matrix, numpy.ndarray) – Values are appended to adjoin to the DensePhasedGenotypeMatrix.

  • taxa (numpy.ndarray) – Taxa names to adjoin to the DensePhasedGenotypeMatrix. If values is a DenseHaplotypeMatrix that has a non-None taxa field, providing this argument overwrites the field.

  • taxa_grp (numpy.ndarray) – Taxa groups to adjoin to the DensePhasedGenotypeMatrix. If values is a DenseHaplotypeMatrix that has a non-None taxa_grp field, providing this argument overwrites the field.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A copy of mat with values appended to axis. Note that adjoin does not occur in-place: a new Matrix is allocated and filled.

Return type:

Matrix

adjoin_vrnt(values, vrnt_chrgrp=None, vrnt_phypos=None, vrnt_name=None, vrnt_genpos=None, vrnt_xoprob=None, vrnt_hapgrp=None, vrnt_hapalt=None, vrnt_hapref=None, vrnt_mask=None, **kwargs)[source]#

Add additional elements to the end of the DensePhasedGenotypeMatrix along the variant axis.

Parameters:
  • values (Matrix, numpy.ndarray) – Values are appended to adjoin to the DensePhasedGenotypeMatrix.

  • vrnt_chrgrp (numpy.ndarray) – Variant chromosome groups to adjoin to the DensePhasedGenotypeMatrix.

  • vrnt_phypos (numpy.ndarray) – Variant chromosome physical positions to adjoin to the DensePhasedGenotypeMatrix.

  • vrnt_name (numpy.ndarray) – Variant names to adjoin to the DensePhasedGenotypeMatrix.

  • vrnt_genpos (numpy.ndarray) – Variant chromosome genetic positions to adjoin to the DensePhasedGenotypeMatrix.

  • vrnt_xoprob (numpy.ndarray) – Sequential variant crossover probabilities to adjoin to the DensePhasedGenotypeMatrix.

  • vrnt_hapgrp (numpy.ndarray) – Variant haplotype labels to adjoin to the DensePhasedGenotypeMatrix.

  • vrnt_hapalt (numpy.ndarray) – Variant haplotype sequence.

  • vrnt_hapref (numpy.ndarray) – Variant haplotype reference sequence.

  • vrnt_mask (numpy.ndarray) – Variant mask to adjoin to the DensePhasedGenotypeMatrix.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A copy of mat with values appended to axis. Note that adjoin does not occur in-place: a new Matrix is allocated and filled.

Return type:

Matrix

afixed(dtype=None)#

Determine allele fixation for loci across all taxa.

Parameters:

dtype (dtype, None) – The data type of the returned array. If None, use the native type.

Returns:

out – A numpy.ndarray of shape (p,) containing indicator variables for whether a locus is fixed at a particular locus.

Return type:

numpy.ndarray

afreq(dtype=None)[source]#

Allele frequency of the non-zero allele across all taxa.

Parameters:

dtype (dtype, None) – The data type of the returned array. If None, use the native type.

Returns:

out – A numpy.ndarray of shape (p,) containing allele frequencies of the allele coded as 1 for all p loci.

Return type:

numpy.ndarray

apoly(dtype=None)[source]#

Allele polymorphism presence or absense across all loci.

Parameters:

dtype (dtype, None) – The data type of the returned array. If None, use the native type.

Returns:

out – A numpy.ndarray of shape (p,) containing indicator variables for whether the locus is polymorphic.

Return type:

numpy.ndarray

append(values, axis=-1, taxa=None, taxa_grp=None, vrnt_chrgrp=None, vrnt_phypos=None, vrnt_name=None, vrnt_genpos=None, vrnt_xoprob=None, vrnt_hapgrp=None, vrnt_hapalt=None, vrnt_hapref=None, vrnt_mask=None, **kwargs)#

Append values to the matrix.

Parameters:
  • values (Matrix, numpy.ndarray) – Values are appended to append to the matrix. Must be of type int8. Must be of shape (m, n, p)

  • axis (int) – The axis along which values are appended.

Return type:

None

append_phase(values, **kwargs)#

Append values to the Matrix along the phase axis.

Parameters:
  • values (Matrix, numpy.ndarray) – Values are appended to append to the matrix.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

append_taxa(values, taxa=None, taxa_grp=None, **kwargs)#

Append values to the Matrix along the taxa axis.

Parameters:
  • values (Matrix, numpy.ndarray) – Values are appended to append to the matrix.

  • taxa (numpy.ndarray) – Taxa names to append to the Matrix.

  • taxa_grp (numpy.ndarray) – Taxa groups to append to the Matrix.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

append_vrnt(values, vrnt_chrgrp=None, vrnt_phypos=None, vrnt_name=None, vrnt_genpos=None, vrnt_xoprob=None, vrnt_hapgrp=None, vrnt_hapalt=None, vrnt_hapref=None, vrnt_mask=None, **kwargs)#

Append values to the Matrix along the variant axis.

Parameters:
  • values (Matrix, numpy.ndarray) – Values are appended to append to the matrix.

  • vrnt_chrgrp (numpy.ndarray) – Variant chromosome groups to append to the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_chrgrp field, providing this argument overwrites the field.

  • vrnt_phypos (numpy.ndarray) – Variant chromosome physical positions to append to the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_phypos field, providing this argument overwrites the field.

  • vrnt_name (numpy.ndarray) – Variant names to append to the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_name field, providing this argument overwrites the field.

  • vrnt_genpos (numpy.ndarray) – Variant chromosome genetic positions to append to the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_genpos field, providing this argument overwrites the field.

  • vrnt_xoprob (numpy.ndarray) – Sequential variant crossover probabilities to append to the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_xoprob field, providing this argument overwrites the field.

  • vrnt_hapgrp (numpy.ndarray) – Variant haplotype labels to append to the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_hapgrp field, providing this argument overwrites the field.

  • vrnt_hapalt (numpy.ndarray) – Variant alternative haplotype labels to append to the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_hapgrp field, providing this argument overwrites the field.

  • vrnt_hapref (numpy.ndarray) – Variant reference haplotype labels to append to the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_hapgrp field, providing this argument overwrites the field.

  • vrnt_mask (numpy.ndarray) – Variant mask to append to the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_mask field, providing this argument overwrites the field.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

classmethod concat(mats, axis=-1, **kwargs)#

Concatenate matrices together along an axis.

Parameters:
  • mats (Sequence of matrices) – List of Matrix to concatenate. The matrices must have the same shape, except in the dimension corresponding to axis.

  • axis (int) – The axis along which the arrays will be joined.

  • kwargs (dict) – Additional keyword arguments

Returns:

out – The concatenated DensePhasedTaxaVariantMatrix. Note that concat does not occur in-place: a new DensePhasedTaxaVariantMatrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

classmethod concat_phase(mats, **kwargs)#

Concatenate list of Matrix together along the taxa axis.

Parameters:
  • mats (Sequence of Matrix) – List of Matrix to concatenate. The matrices must have the same shape, except in the dimension corresponding to axis.

  • kwargs (dict) – Additional keyword arguments

Returns:

out – The concatenated DensePhasedTaxaVariantMatrix. Note that concat does not occur in-place: a new DensePhasedTaxaVariantMatrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

classmethod concat_taxa(mats, **kwargs)[source]#

Concatenate list of DensePhasedGenotypeMatrix together along the taxa axis.

Parameters:
  • mats (Sequence of DensePhasedGenotypeMatrix) – List of DensePhasedGenotypeMatrix to concatenate. The matrices must have the same shape, except in the dimension corresponding to axis.

  • kwargs (dict) – Additional keyword arguments

Returns:

out – The concatenated DensePhasedGenotypeMatrix. Note that concat does not occur in-place: a new DensePhasedGenotypeMatrix is allocated and filled.

Return type:

DensePhasedGenotypeMatrix

classmethod concat_vrnt(mats, **kwargs)[source]#

Concatenate list of DensePhasedGenotypeMatrix together along the variant axis.

Parameters:
  • mats (Sequence of DensePhasedGenotypeMatrix) – List of DensePhasedGenotypeMatrix to concatenate. The matrices must have the same shape, except in the dimension corresponding to axis.

  • kwargs (dict) – Additional keyword arguments

Returns:

out – The concatenated matrix. Note that concat does not occur in-place: a new DensePhasedGenotypeMatrix is allocated and filled.

Return type:

DensePhasedGenotypeMatrix

copy()[source]#

Make a shallow copy of the DensePhasedGenotypeMatrix.

Returns:

out – A shallow copy of the original DensePhasedGenotypeMatrix.

Return type:

DensePhasedGenotypeMatrix

deepcopy(memo=None)[source]#

Make a deep copy of the DensePhasedGenotypeMatrix.

Parameters:

memo (dict) – Dictionary of memo metadata.

Returns:

out – A deep copy of the original DensePhasedGenotypeMatrix.

Return type:

DensePhasedGenotypeMatrix

delete(obj, axis=-1, **kwargs)#

Delete sub-arrays along an axis.

Parameters:
  • obj (int, slice, or Sequence of ints) – Indicate indices of sub-arrays to remove along the specified axis.

  • axis (int) – The axis along which to delete the subarray defined by obj.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A DensePhasedTaxaVariantMatrix with deleted elements. Note that concat does not occur in-place: a new DensePhasedTaxaVariantMatrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

delete_phase(obj, **kwargs)#

Delete sub-arrays along the phase axis.

Parameters:
  • obj (int, slice, or Sequence of ints) – Indicate indices of sub-arrays to remove along the specified axis.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A DensePhasedTaxaVariantMatrix with deleted elements. Note that concat does not occur in-place: a new DensePhasedTaxaVariantMatrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

delete_taxa(obj, **kwargs)[source]#

Delete sub-arrays along the taxa axis.

Parameters:
  • obj (int, slice, or Sequence of ints) – Indicate indices of sub-arrays to remove along the specified axis.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A DensePhasedGenotypeMatrix with deleted elements. Note that concat does not occur in-place: a new DensePhasedGenotypeMatrix is allocated and filled.

Return type:

DensePhasedGenotypeMatrix

delete_vrnt(obj, **kwargs)[source]#

Delete sub-arrays along the variant axis.

Parameters:
  • obj (int, slice, or Sequence of ints) – Indicate indices of sub-arrays to remove along the specified axis.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A DensePhasedGenotypeMatrix with deleted elements. Note that concat does not occur in-place: a new DensePhasedGenotypeMatrix is allocated and filled.

Return type:

DensePhasedGenotypeMatrix

classmethod from_hdf5(filename, groupname=None)[source]#

Read a DensePhasedGenotypeMatrix from an HDF5 file.

Parameters:
  • filename (str, Path, h5py.File) – If str or Path, an HDF5 file name from which to read. File is closed after reading. If h5py.File, an opened HDF5 file from which to read. File is not closed after reading.

  • groupname (str, None) – If str, HDF5 group name under which GenotypeMatrix data is stored. If None, GenotypeMatrix is read from base HDF5 group.

Returns:

gmat – A genotype matrix read from file.

Return type:

DensePhasedGenotypeMatrix

classmethod from_vcf(filename, auto_group_vrnt=True)[source]#

Read a DensePhasedGenotypeMatrix from a VCF file. This classmethod treats the VCF file as if it has been phased. It does not check if this assumption is correct.

Parameters:
  • filename (str) – Path to a VCF file.

  • auto_group_vrnt (bool) – Whether to automatically group variants into chromosome groupings.

Returns:

out – A DensePhasedGenotypeMatrix read from the VCF file.

Return type:

DensePhasedGenotypeMatrix

group(axis=-1, **kwargs)#

Sort the DensePhasedTaxaVariantMatrix along an axis, then populate grouping indices.

Parameters:
  • axis (int) – The axis along which values are grouped.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

group_taxa(**kwargs)#

Sort the Matrix along the taxa axis, then populate grouping indices for the taxa axis.

Parameters:

kwargs (dict) – Additional keyword arguments.

Return type:

None

group_vrnt(**kwargs)#

Sort the Matrix along the variant axis, then populate grouping indices for the variant axis.

Parameters:

kwargs (dict) – Additional keyword arguments.

Return type:

None

gtcount(dtype=None)[source]#

Gather genotype counts for homozygous major, heterozygous, homozygous minor for all individuals.

Parameters:

dtype (dtype, None) – The data type of the returned array. If None, use the native type.

Returns:

out – An int64 array of shape (g,p) containing allele counts across all p loci for each of g genotype combinations.

Where:

  • out[0] is the count of 0 genotype across all loci

  • out[1] is the count of 1 genotype across all loci

  • out[2] is the count of 2 genotype across all loci

  • ...

  • out[g-1] is the count of g-1 genotype across all loci

Return type:

numpy.ndarray

gtfreq(dtype=None)[source]#

Gather genotype frequencies for homozygous major, heterozygous, homozygous minor across all individuals.

Returns:

out – An float64 array of shape (g,p) containing haplotype counts across all p loci for all g genotype combinations.

Where:

  • out[0] is the frequency of 0 genotype across all loci

  • out[1] is the frequency of 1 genotype across all loci

  • out[2] is the frequency of 2 genotype across all loci

  • ...

  • out[g-1] is the frequency of g-1 genotype across all loci

Return type:

numpy.ndarray

incorp(obj, values, axis=-1, taxa=None, taxa_grp=None, vrnt_chrgrp=None, vrnt_phypos=None, vrnt_name=None, vrnt_genpos=None, vrnt_xoprob=None, vrnt_hapgrp=None, vrnt_hapalt=None, vrnt_hapref=None, vrnt_mask=None, **kwargs)#

Incorporate values along the given axis before the given indices.

Parameters:
  • obj (int, slice, or Sequence of ints) – Object that defines the index or indices before which values is incorporated.

  • values (array_like) – Values to incorporate into the matrix.

  • axis (int) – The axis along which values are incorporated.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

incorp_phase(obj, values, **kwargs)#

Incorporate values along the taxa axis before the given indices.

Parameters:
  • obj (int, slice, or Sequence of ints) – Object that defines the index or indices before which values is incorporated.

  • values (Matrix, numpy.ndarray) – Values to incorporate into the matrix.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

incorp_taxa(obj, values, taxa=None, taxa_grp=None, **kwargs)#

Incorporate values along the taxa axis before the given indices.

Parameters:
  • obj (int, slice, or Sequence of ints) – Object that defines the index or indices before which values is incorporated.

  • values (Matrix, numpy.ndarray) – Values to incorporate into the matrix.

  • taxa (numpy.ndarray) – Taxa names to incorporate into the Matrix.

  • taxa_grp (numpy.ndarray) – Taxa groups to incorporate into the Matrix.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

incorp_vrnt(obj, values, vrnt_chrgrp=None, vrnt_phypos=None, vrnt_name=None, vrnt_genpos=None, vrnt_xoprob=None, vrnt_hapgrp=None, vrnt_hapalt=None, vrnt_hapref=None, vrnt_mask=None, **kwargs)#

Incorporate values along the variant axis before the given indices.

Parameters:
  • obj (int, slice, or Sequence of ints) – Object that defines the index or indices before which values is incorporated.

  • values (Matrix, numpy.ndarray) – Values to incorporate into the matrix.

  • vrnt_chrgrp (numpy.ndarray) – Variant chromosome groups to incorporate into the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_chrgrp field, providing this argument overwrites the field.

  • vrnt_phypos (numpy.ndarray) – Variant chromosome physical positions to incorporate into the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_phypos field, providing this argument overwrites the field.

  • vrnt_name (numpy.ndarray) – Variant names to incorporate into the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_name field, providing this argument overwrites the field.

  • vrnt_genpos (numpy.ndarray) – Variant chromosome genetic positions to incorporate into the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_genpos field, providing this argument overwrites the field.

  • vrnt_xoprob (numpy.ndarray) – Sequential variant crossover probabilities to incorporate into the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_xoprob field, providing this argument overwrites the field.

  • vrnt_hapgrp (numpy.ndarray) – Variant haplotype labels to incorporate into the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_hapgrp field, providing this argument overwrites the field.

  • vrnt_hapalt (numpy.ndarray) – Variant alternative haplotype labels to incorporate into the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_hapgrp field, providing this argument overwrites the field.

  • vrnt_hapref (numpy.ndarray) – Variant reference haplotype labels to incorporate into the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_hapgrp field, providing this argument overwrites the field.

  • vrnt_mask (numpy.ndarray) – Variant mask to incorporate into the Matrix. If values is a DenseVariantMatrix that has a non-None vrnt_mask field, providing this argument overwrites the field.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

insert(obj, values, axis=-1, taxa=None, taxa_grp=None, vrnt_chrgrp=None, vrnt_phypos=None, vrnt_name=None, vrnt_genpos=None, vrnt_xoprob=None, vrnt_hapgrp=None, vrnt_hapalt=None, vrnt_hapref=None, vrnt_mask=None, **kwargs)#

Insert values along the given axis before the given indices.

Parameters:
  • obj (int, slice, or Sequence of ints) – Object that defines the index or indices before which values is inserted.

  • values (Matrix, numpy.ndarray) – Values to insert into the matrix.

  • axis (int) – The axis along which values are inserted.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A Matrix with values inserted. Note that insert does not occur in-place: a new Matrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

insert_phase(obj, values, **kwargs)#

Insert values along the phase axis before the given indices.

Parameters:
  • obj (int, slice, or Sequence of ints) – Object that defines the index or indices before which values is inserted.

  • values (Matrix, numpy.ndarray) – Values to insert into the matrix.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A DensePhasedTaxaVariantMatrix with values inserted. Note that insert does not occur in-place: a new DensePhasedTaxaVariantMatrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

insert_taxa(obj, values, taxa=None, taxa_grp=None, **kwargs)[source]#

Insert values along the taxa axis before the given indices.

Parameters:
  • obj (int, slice, or Sequence of ints) – Object that defines the index or indices before which values is inserted.

  • values (Matrix, numpy.ndarray) – Values to insert into the matrix.

  • taxa (numpy.ndarray) – Taxa names to insert into the DensePhasedGenotypeMatrix.

  • taxa_grp (numpy.ndarray) – Taxa groups to insert into the DensePhasedGenotypeMatrix.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A DensePhasedGenotypeMatrix with values inserted. Note that insert does not occur in-place: a new DensePhasedGenotypeMatrix is allocated and filled.

Return type:

DensePhasedGenotypeMatrix

insert_vrnt(obj, values, vrnt_chrgrp=None, vrnt_phypos=None, vrnt_name=None, vrnt_genpos=None, vrnt_xoprob=None, vrnt_hapgrp=None, vrnt_hapalt=None, vrnt_hapref=None, vrnt_mask=None, **kwargs)[source]#

Insert values along the variant axis before the given indices.

Parameters:
  • obj (int, slice, or Sequence of ints) – Object that defines the index or indices before which values is inserted.

  • values (array_like) – Values to insert into the matrix.

  • vrnt_chrgrp (numpy.ndarray) – Variant chromosome groups to insert into the DensePhasedGenotypeMatrix.

  • vrnt_phypos (numpy.ndarray) – Variant chromosome physical positions to insert into the DensePhasedGenotypeMatrix.

  • vrnt_name (numpy.ndarray) – Variant names to insert into the DensePhasedGenotypeMatrix.

  • vrnt_genpos (numpy.ndarray) – Variant chromosome genetic positions to insert into the DensePhasedGenotypeMatrix.

  • vrnt_xoprob (numpy.ndarray) – Sequential variant crossover probabilities to insert into the DensePhasedGenotypeMatrix.

  • vrnt_hapgrp (numpy.ndarray) – Variant haplotype labels to insert into the DensePhasedGenotypeMatrix.

  • vrnt_mask (numpy.ndarray) – Variant mask to insert into the DensePhasedGenotypeMatrix.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – A DensePhasedGenotypeMatrix with values inserted. Note that insert does not occur in-place: a new DensePhasedGenotypeMatrix is allocated and filled.

Return type:

DensePhasedGenotypeMatrix

interp_genpos(gmap, **kwargs)#

Interpolate genetic map postions for variants using a GeneticMap

Parameters:

gmap (GeneticMap) – A genetic map from which to interopolate genetic map postions for loci within the VariantMatrix.

Return type:

None

interp_xoprob(gmap, gmapfn, **kwargs)#

Interpolate genetic map positions AND crossover probabilities between sequential markers using a GeneticMap and a GeneticMapFunction.

Parameters:
  • gmap (GeneticMap) – A genetic map from which to interopolate genetic map postions for loci within the VariantMatrix.

  • gmapfn (GeneticMapFunction) – A genetic map function from which to interpolate crossover probabilities for loci within the VariantMatrix.

Return type:

None

is_grouped(axis=-1, **kwargs)#

Determine whether the Matrix has been sorted and grouped.

Parameters:
  • axis (int) – Axis to test for sorting and grouping

  • kwargs (dict) – Additional keyword arguments

Returns:

grouped – True or False indicating whether the GeneticMap has been sorted and grouped.

Return type:

bool

is_grouped_taxa(**kwargs)#

Determine whether the Matrix has been sorted and grouped along the taxa axis.

Parameters:

kwargs (dict) – Additional keyword arguments.

Returns:

grouped – True or False indicating whether the Matrix has been sorted and grouped.

Return type:

bool

is_grouped_vrnt(**kwargs)#

Determine whether the Matrix has been sorted and grouped along the variant axis.

Parameters:

kwargs (dict) – Additional keyword arguments.

Returns:

grouped – True or False indicating whether the Matrix has been sorted and grouped.

Return type:

bool

lexsort(keys=None, axis=-1, **kwargs)#

Perform an indirect stable sort using a tuple of keys.

Parameters:
  • keys (tuple, None) – A tuple of columns to be sorted. The last column is the primary sort key. If None, sort using vrnt_chrgrp as primary key, and vrnt_phypos as secondary key.

  • axis (int) – The axis of the Matrix over which to sort values.

  • kwargs (dict) – Additional keyword arguments.

Returns:

indices – Array of indices that sort the keys.

Return type:

numpy.ndarray

lexsort_taxa(keys=None, **kwargs)#

Perform an indirect stable sort using a sequence of keys along the taxa axis.

Parameters:
  • keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.

  • kwargs (dict) – Additional keyword arguments.

Returns:

indices – Array of indices that sort the keys along the specified axis.

Return type:

A (N,) ndarray of ints

lexsort_vrnt(keys=None, **kwargs)#

Perform an indirect stable sort using a sequence of keys along the variant axis.

Parameters:
  • keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.

  • kwargs (dict) – Additional keyword arguments.

Returns:

indices – Array of indices that sort the keys along the specified axis.

Return type:

A (N,) ndarray of ints

maf(dtype=None)[source]#

Minor allele frequency across all taxa.

Parameters:

dtype (dtype, None) – The data type of the returned array. If None, use the native type.

Returns:

out – A numpy.ndarray of shape (p,) containing allele frequencies for the minor allele.

Return type:

numpy.ndarray

property mat: ndarray#

Pointer to raw numpy.ndarray object.

mat_asformat(format)[source]#

Get mat in a specific format type.

Parameters:

format (str) – Desired output format. Options are “{0,1,2}”, “{-1,0,1}”, “{-1,m,1}”.

Returns:

out – Matrix in the desired output format.

Return type:

numpy.ndarray

property mat_format#

Get matrix representation format

property mat_ndim: int#

Number of dimensions of the raw numpy.ndarray.

property mat_shape: tuple#

Shape of the raw numpy.ndarray.

meh(dtype=None)[source]#

Mean expected heterozygosity across all taxa.

Parameters:

dtype (dtype, None) – The data type of the returned array. If None, use the native type.

Returns:

out – A number representing the mean expected heterozygous. If dtype is None, then a native 64-bit floating point is returned. Otherwise, of type specified by dtype.

Return type:

Real

property nphase#

Get number of phases

property ntaxa: int#

Number of taxa

property nvrnt: int#

Number of variants.

property phase_axis#

Get phase axis number

property ploidy: int#

Get matrix ploidy number

remove(obj, axis=-1, **kwargs)#

Remove sub-arrays along an axis.

Parameters:
  • obj (int, slice, or Sequence of ints) – Indicate indices of sub-arrays to remove along the specified axis.

  • axis (int) – The axis along which to remove the subarray defined by obj.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

remove_phase(obj, **kwargs)#

Remove sub-arrays along the phase axis.

Parameters:
  • obj (int, slice, or Sequence of ints) – Indicate indices of sub-arrays to remove along the specified axis.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

remove_taxa(obj, **kwargs)#

Remove sub-arrays along the taxa axis.

Parameters:
  • obj (int, slice, or Sequence of ints) – Indicate indices of sub-arrays to remove along the specified axis.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

remove_vrnt(obj, **kwargs)#

Remove sub-arrays along the variant axis.

Parameters:
  • obj (int, slice, or Sequence of ints) – Indicate indices of sub-arrays to remove along the specified axis.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

reorder(indices, axis=-1, **kwargs)#

Reorder the VariantMatrix.

Parameters:
  • indices (numpy.ndarray) – Indices of where to place elements.

  • axis (int) – The axis over which to reorder values.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

reorder_taxa(indices, **kwargs)#

Reorder elements of the Matrix along the taxa axis using an array of indices. Note this modifies the Matrix in-place.

Parameters:
  • indices (A (N,) ndarray of ints) – Array of indices that reorder the matrix along the specified axis.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

reorder_vrnt(indices, **kwargs)#

Reorder elements of the Matrix along the variant axis using an array of indices. Note this modifies the Matrix in-place.

Parameters:
  • indices (A (N,) ndarray of ints) – Array of indices that reorder the matrix along the specified axis.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

select(indices, axis=-1, **kwargs)#

Select certain values from the matrix.

Parameters:
  • indices (array_like (Nj, ...)) – The indices of the values to select.

  • axis (int) – The axis along which values are selected.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – The output DensePhasedTaxaVariantMatrix with values selected. Note that select does not occur in-place: a new DensePhasedTaxaVariantMatrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

select_phase(indices, **kwargs)#

Select certain values from the Matrix along the phase axis.

Parameters:
  • indices (array_like (Nj, ...)) – The indices of the values to select.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – The output DensePhasedTaxaVariantMatrix with values selected. Note that select does not occur in-place: a new DensePhasedTaxaVariantMatrix is allocated and filled.

Return type:

DensePhasedTaxaVariantMatrix

select_taxa(indices, **kwargs)[source]#

Select certain values from the DensePhasedGenotypeMatrix along the taxa axis.

Parameters:
  • indices (array_like (Nj, ...)) – The indices of the values to select.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – The output DensePhasedGenotypeMatrix with values selected. Note that select does not occur in-place: a new DensePhasedGenotypeMatrix is allocated and filled.

Return type:

DensePhasedGenotypeMatrix

select_vrnt(indices, **kwargs)[source]#

Select certain values from the DensePhasedGenotypeMatrix along the variant axis.

Parameters:
  • indices (array_like (Nj, ...)) – The indices of the values to select.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – The output DensePhasedGenotypeMatrix with values selected. Note that select does not occur in-place: a new DensePhasedGenotypeMatrix is allocated and filled.

Return type:

DensePhasedGenotypeMatrix

sort(keys=None, axis=-1, **kwargs)#

Reset metadata for corresponding axis: name, stix, spix, len. Sort the VariantMatrix using a tuple of keys.

Parameters:
  • keys (tuple, None) – A tuple of columns to be sorted. The last column is the primary sort key. If None, sort using vrnt_chrgrp as primary key, and vrnt_phypos as secondary key.

  • axis (int) – The axis over which to sort values.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

sort_taxa(keys=None, **kwargs)#

Sort slements of the Matrix along the taxa axis using a sequence of keys. Note this modifies the Matrix in-place.

Parameters:
  • keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

sort_vrnt(keys=None, **kwargs)#

Sort slements of the Matrix along the variant axis using a sequence of keys. Note this modifies the Matrix in-place.

Parameters:
  • keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

tacount(dtype=None)[source]#

Allele count of the non-zero allele within each taxon.

Parameters:

dtype (dtype, None) – The data type of the accumulator and returned array. If None, use the native accumulator type (int or float).

Returns:

out – A numpy.ndarray of shape (n,p) containing allele counts of the allele coded as 1 for all n individuals, for all p loci.

Return type:

numpy.ndarray

tafreq(dtype=None)[source]#

Allele frequency of the non-zero allele within each taxon.

Parameters:

dtype (dtype, None) – The data type of the returned array. If None, use the native type.

Returns:

out – A numpy.ndarray of shape (n,p) containing allele frequencies of the allele coded as 1 for all n individuals, for all p loci.

Return type:

numpy.ndarray

property taxa: ndarray | None#

Taxa label array

property taxa_axis#

Get taxa axis number

property taxa_grp: ndarray | None#

Taxa group label.

property taxa_grp_len: ndarray | None#

Taxa group length.

property taxa_grp_name: ndarray | None#

Taxa group name.

property taxa_grp_spix: ndarray | None#

Taxa group stop index.

property taxa_grp_stix: ndarray | None#

Taxa group start index.

to_hdf5(filename, groupname=None, overwrite=True)#

Write GenotypeMatrix to an HDF5 file.

Parameters:
  • filename (str, Path, h5py.File) – If str, an HDF5 file name to which to write. File is closed after writing. If h5py.File, an opened HDF5 file to which to write. File is not closed after writing.

  • groupname (str, None) – If str, an HDF5 group name under which GenotypeMatrix data is stored. If None, GenotypeMatrix is written to the base HDF5 group.

  • overwrite (bool) – Whether to overwrite values in an HDF5 file if a field already exists.

Return type:

None

ungroup(axis=-1, **kwargs)#

Ungroup the DensePhasedTaxaVariantMatrix along an axis by removing grouping metadata.

Parameters:
  • axis (int) – The axis along which values should be ungrouped.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

ungroup_taxa(**kwargs)#

Ungroup the DenseTaxaMatrix along the taxa axis by removing taxa group metadata.

Parameters:

kwargs (dict) – Additional keyword arguments.

Return type:

None

ungroup_vrnt(**kwargs)#

Ungroup the DenseVariantMatrix along the variant axis by removing variant group metadata.

Parameters:

kwargs (dict) – Additional keyword arguments.

Return type:

None

property vrnt_axis#

Get variant axis

property vrnt_chrgrp: ndarray | None#

Variant chromosome group label.

property vrnt_chrgrp_len: ndarray | None#

Variant chromosome group length.

property vrnt_chrgrp_name: ndarray | None#

Variant chromosome group names.

property vrnt_chrgrp_spix: ndarray | None#

Variant chromosome group stop indices.

property vrnt_chrgrp_stix: ndarray | None#

Variant chromosome group start indices.

property vrnt_genpos: ndarray | None#

Variant genetic position.

property vrnt_hapalt: ndarray | None#

Variant haplotype sequence.

property vrnt_hapgrp: ndarray | None#

Variant haplotype group label.

property vrnt_hapref: ndarray | None#

Variant reference haplotype sequence.

property vrnt_mask: ndarray | None#

Variant mask.

property vrnt_name: ndarray | None#

Variant name.

property vrnt_phypos: ndarray | None#

Variant physical position.

property vrnt_xoprob: ndarray | None#

Variant crossover sequential probability.