GeneticMap#

class pybrops.popgen.gmap.GeneticMap.GeneticMap[source]#

Bases: PandasInputOutput, CSVInputOutput

An abstract class for genetic map objects.

The purpose of this abstract class is to provide base functionality for:
  1. Genetic map representation.

  2. Genetic map metadata.

  3. Genetic map routines.

  4. Genetic map interpolation spline construction.

  5. Genetic map spline interpolation.

  6. Import and export of genetic maps.

Methods

build_spline

Build a spline for estimating genetic map distances.

congruence

Assess physical and genetic map site congruency.

copy

Make a shallow copy of the GeneticMap.

deepcopy

Make a deep copy of the GeneticMap.

from_csv

Read a GeneticMap from a CSV file.

from_pandas

Read an object from a pandas.DataFrame.

gdist1g

Calculate sequential genetic distances using genetic positions.

gdist1p

Calculate sequential genetic distances using physical positions.

gdist2g

Calculate pairwise genetic distances using genetic positions.

gdist2p

Calculate pairwise genetic distances using physical positions.

group

Sort the GeneticMap jointly by chromosome group and physical position, then populate grouping indices.

has_spline

Return whether or not the GeneticMap has a built spline.

interp_genpos

Interpolate genetic positions given variant physical positions.

interp_gmap

Interpolate a new genetic map from the current genetic map.

is_congruent

Determine if all sites in the genetic map demonstrate congruence with their supposed physical and genetic positions.

is_grouped

Determine whether the GeneticMap has been sorted and grouped.

lexsort

Perform an indirect stable sort using a sequence of keys.

remove

Remove indices from the GeneticMap.

remove_discrepancies

Remove discrepancies between the physical map and the genetic map.

reorder

Reorder markers in-place in the GeneticMap using an array of indices.

select

Keep only selected markers, removing all others from the GeneticMap.

sort

Sort slements of the GeneticMap using a sequence of keys.

to_csv

Write a GeneticMap to a CSV file.

to_pandas

Export a GeneticMap object to a pandas.DataFrame.

ungroup

Remove grouping metadata from the GeneticMap.

Attributes

nvrnt

Number of variants in the GeneticMap.

spline

Interpolation spline(s).

spline_fill_value

Default spline fill value.

spline_kind

Spline kind.

vrnt_chrgrp

Variant chromosome group label.

vrnt_chrgrp_len

Variant chromosome group length.

vrnt_chrgrp_name

Variant chromosome group names.

vrnt_chrgrp_spix

Variant chromosome group stop indices.

vrnt_chrgrp_stix

Variant chromosome group start indices.

vrnt_genpos

Variant genetic position.

vrnt_phypos

Variant physical position.

abstract build_spline(kind, fill_value, **kwargs)[source]#

Build a spline for estimating genetic map distances.

Parameters:
  • kind (str, default = 'linear') – Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.

  • fill_value (array-like, {'extrapolate'}, default = 'extrapolate') – If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

abstract congruence()[source]#

Assess physical and genetic map site congruency. If the genetic map is not grouped, it will be grouped.

Returns:

out – A boolean matrix of map concordancies where:

  • True = the current marker has a map_pos >= the previous position

  • False = the current marker has a map_pos < the previous position

Return type:

numpy.ndarray

Notes

This assumes high contiguity between physical and genetic maps (i.e. a high quality reference genome). This assumption may cause major issues if there are incorrect markers at the beginning of the chromosome. This also assumes the first marker on the chromosome is placed correctly.

abstract copy()[source]#

Make a shallow copy of the GeneticMap.

Returns:

out – A shallow copy of the original GeneticMap.

Return type:

GeneticMap

abstract deepcopy(memo)[source]#

Make a deep copy of the GeneticMap.

Parameters:

memo (dict) – Dictionary of memo metadata.

Returns:

out – A deep copy of the original GeneticMap.

Return type:

GeneticMap

abstract classmethod from_csv(filename, **kwargs)[source]#

Read a GeneticMap from a CSV file.

Parameters:
  • filename (str) – CSV file name from which to read.

  • kwargs (dict) – Additional keyword arguments to use for dictating importing from a CSV.

Returns:

out – A GeneticMap read from a CSV file.

Return type:

GeneticMap

abstract classmethod from_pandas(df, **kwargs)[source]#

Read an object from a pandas.DataFrame.

Parameters:
  • df (pandas.DataFrame) – Pandas dataframe from which to read.

  • kwargs (dict) – Additional keyword arguments to use for dictating importing from a pandas.DataFrame.

Returns:

out – A GeneticMap read from a pandas.DataFrame.

Return type:

GeneticMap

abstract gdist1g(vrnt_chrgrp, vrnt_genpos, ast, asp)[source]#

Calculate sequential genetic distances using genetic positions. Requires vrnt_chrgrp and vrnt_genpos to have been sorted jointly in ascending order.

Parameters:
  • vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with vrnt_genpos.

  • vrnt_genpos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with vrnt_chrgrp.

  • ast (Integral, None) – Optional array start index (inclusive). If None, assume that all array elements are to be used for sequential genetic distance calculations.

  • asp (Integral, None) – Optional array stop index (exclusive). If None, assume that all array elements are to be used for sequential genetic distance calculations.

Returns:

out – A 1D array of distances between the marker prior.

Return type:

numpy.ndarray

Notes

Sequential distance arrays will start every chromosome with numpy.inf!

abstract gdist1p(vrnt_chrgrp, vrnt_phypos, ast, asp)[source]#

Calculate sequential genetic distances using physical positions. Requires vrnt_chrgrp and vrnt_phypos to have been sorted jointly in ascending order. Requires an interpolation spline to have been built beforehand.

Parameters:
  • vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with vrnt_phypos.

  • vrnt_phypos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with vrnt_chrgrp.

  • ast (Integral, None) – Optional array start index (inclusive). If None, assume that all array elements are to be used for sequential genetic distance calculations.

  • asp (Integral, None) – Optional array stop index (exclusive). If None, assume that all array elements are to be used for sequential genetic distance calculations.

Returns:

out – A 1D array of distances between the marker prior.

Return type:

numpy.ndarray

Notes

Sequential distance arrays will start every chromosome with numpy.inf!

abstract gdist2g(vrnt_chrgrp, vrnt_genpos, rst, rsp, cst, csp)[source]#

Calculate pairwise genetic distances using genetic positions. Requires vrnt_chrgrp and vrnt_genpos to have been sorted jointly in ascending order.

Parameters:
  • vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with vrnt_genpos.

  • vrnt_genpos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with vrnt_chrgrp.

  • rst (Integral, None) – Optional row start index (inclusive). If None, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.

  • rsp (Integral, None) – Optional row stop index (exclusive). If None, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.

  • cst (Integral, None) – Optional column start index (inclusive). If None, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.

  • csp (Integral, None) – Optional column stop index (exclusive). If None, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.

Returns:

out – A 2D array of distances between marker pairs.

Return type:

numpy.ndarray

abstract gdist2p(vrnt_chrgrp, vrnt_phypos, rst, rsp, cst, csp)[source]#

Calculate pairwise genetic distances using physical positions. Requires vrnt_chrgrp and vrnt_phypos to have been sorted jointly in ascending order.

Parameters:
  • vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with vrnt_phypos.

  • vrnt_phypos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with vrnt_chrgrp.

  • rst (Integral, None) – Optional row start index (inclusive). If None, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.

  • rsp (Integral, None) – Optional row stop index (exclusive). If None, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.

  • cst (Integral, None) – Optional column start index (inclusive). If None, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.

  • csp (Integral, None) – Optional column stop index (exclusive). If None, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.

Returns:

out – A 2D array of distances between marker pairs.

Return type:

numpy.ndarray

abstract group(**kwargs)[source]#

Sort the GeneticMap jointly by chromosome group and physical position, then populate grouping indices.

Parameters:

kwargs (dict) – Additional keyword arguments.

Return type:

None

abstract has_spline()[source]#

Return whether or not the GeneticMap has a built spline.

Returns:

out – Whether the GeneticMap has a spline built.

Return type:

bool

abstract interp_genpos(vrnt_chrgrp, vrnt_phypos)[source]#

Interpolate genetic positions given variant physical positions.

Parameters:
  • vrnt_chrgrp (numpy.ndarray) – Chromosome/linkage group labels for each marker variant.

  • vrnt_phypos (numpy.ndarray) – Chromosome/linkage group physical positions for each marker variant.

Returns:

out – Interpolated genetic positions for each marker variant.

Return type:

numpy.ndarray

abstract interp_gmap(vrnt_chrgrp, vrnt_phypos, **kwargs)[source]#

Interpolate a new genetic map from the current genetic map. Associate spline of current GeneticMap with new GeneticMap.

Parameters:
  • vrnt_chrgrp (numpy.ndarray) – Chromosome/linkage group labels for each marker variant.

  • vrnt_phypos (numpy.ndarray) – Chromosome/linkage group physical positions for each marker variant.

  • kwargs (dict) – Additional keyword arguments.

Returns:

out – An interpolated genetic map sharing a copy of the spline from the original genetic map.

Return type:

GeneticMap

abstract is_congruent()[source]#

Determine if all sites in the genetic map demonstrate congruence with their supposed physical and genetic positions.

Returns:

out – Whether all genetic map loci demonstrate congruence between physical and genetic positions.

Return type:

bool

abstract is_grouped(**kwargs)[source]#

Determine whether the GeneticMap has been sorted and grouped.

Parameters:

kwargs (dict) – Additional keyword arguments.

Returns:

grouped – True or False indicating whether the Matrix has been sorted and grouped.

Return type:

bool

abstract lexsort(keys, **kwargs)[source]#

Perform an indirect stable sort using a sequence of keys.

Parameters:
  • keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.

  • kwargs (dict) – Additional keyword arguments.

Returns:

indices – Array of indices that sort the keys along the specified axis.

Return type:

A (N,) ndarray of ints

abstract property nvrnt: Integral#

Number of variants in the GeneticMap.

abstract remove(indices, **kwargs)[source]#

Remove indices from the GeneticMap. If the GeneticMap was grouped beforehand, then re-sort and re-group internal arrays after removing indices.

Parameters:
  • indices (numpy.ndarray, slice, int) –

    Array of shape (a,), slice or int of item(s) to remove.

    Where:

    • a is the number of indices to remove.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

abstract remove_discrepancies()[source]#

Remove discrepancies between the physical map and the genetic map. In instances of conflict, assume that the physical map is correct.

Return type:

None

Notes

This assumption may cause major issues if there are incorrect markers at the beginning of the chromosome.

abstract reorder(indices, **kwargs)[source]#

Reorder markers in-place in the GeneticMap using an array of indices.

Parameters:
  • indices (A (N,) ndarray of ints) – Array of indices that reorder the matrix along the specified axis.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

abstract select(indices, **kwargs)[source]#

Keep only selected markers, removing all others from the GeneticMap. If the GeneticMap was grouped beforehand, then re-sort and re-group internal arrays after removing indices.

Parameters:
  • indices (numpy.ndarray, slice, int) –

    Array of shape (a,), slice or int of item(s) to remove.

    Where:

    • a is the number of indices to remove.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

abstract sort(keys, **kwargs)[source]#

Sort slements of the GeneticMap using a sequence of keys. Note this modifies the GeneticMap in-place.

Parameters:
  • keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.

  • kwargs (dict) – Additional keyword arguments.

Return type:

None

abstract property spline: object#

Interpolation spline(s).

abstract property spline_fill_value: object#

Default spline fill value.

abstract property spline_kind: object#

Spline kind.

abstract to_csv(filename, **kwargs)[source]#

Write a GeneticMap to a CSV file.

Parameters:
  • filename (str) – CSV file name to which to write.

  • kwargs (dict) – Additional keyword arguments to use for dictating export to a CSV.

Return type:

None

abstract to_pandas(**kwargs)[source]#

Export a GeneticMap object to a pandas.DataFrame.

Parameters:

kwargs (dict) – Additional keyword arguments to use for dictating export to a pandas.DataFrame.

Returns:

out – An output dataframe.

Return type:

pandas.DataFrame

abstract ungroup(**kwargs)[source]#

Remove grouping metadata from the GeneticMap.

Parameters:

kwargs (dict) – Additional keyword arguments.

Return type:

None

abstract property vrnt_chrgrp: ndarray#

Variant chromosome group label.

abstract property vrnt_chrgrp_len: ndarray#

Variant chromosome group length.

abstract property vrnt_chrgrp_name: ndarray#

Variant chromosome group names.

abstract property vrnt_chrgrp_spix: ndarray#

Variant chromosome group stop indices.

abstract property vrnt_chrgrp_stix: ndarray#

Variant chromosome group start indices.

abstract property vrnt_genpos: ndarray#

Variant genetic position.

abstract property vrnt_phypos: ndarray#

Variant physical position.