GeneticMap#
- class pybrops.popgen.gmap.GeneticMap.GeneticMap[source]#
Bases:
PandasInputOutput
,CSVInputOutput
An abstract class for genetic map objects.
- The purpose of this abstract class is to provide base functionality for:
Genetic map representation.
Genetic map metadata.
Genetic map routines.
Genetic map interpolation spline construction.
Genetic map spline interpolation.
Import and export of genetic maps.
Methods
Build a spline for estimating genetic map distances.
Assess physical and genetic map site congruency.
Make a shallow copy of the GeneticMap.
Make a deep copy of the GeneticMap.
Read a GeneticMap from a CSV file.
Read an object from a pandas.DataFrame.
Calculate sequential genetic distances using genetic positions.
Calculate sequential genetic distances using physical positions.
Calculate pairwise genetic distances using genetic positions.
Calculate pairwise genetic distances using physical positions.
Sort the GeneticMap jointly by chromosome group and physical position, then populate grouping indices.
Return whether or not the GeneticMap has a built spline.
Interpolate genetic positions given variant physical positions.
Interpolate a new genetic map from the current genetic map.
Determine if all sites in the genetic map demonstrate congruence with their supposed physical and genetic positions.
Determine whether the GeneticMap has been sorted and grouped.
Perform an indirect stable sort using a sequence of keys.
Remove indices from the GeneticMap.
Remove discrepancies between the physical map and the genetic map.
Reorder markers in-place in the GeneticMap using an array of indices.
Keep only selected markers, removing all others from the GeneticMap.
Sort slements of the GeneticMap using a sequence of keys.
Write a GeneticMap to a CSV file.
Export a GeneticMap object to a pandas.DataFrame.
Remove grouping metadata from the GeneticMap.
Attributes
Number of variants in the GeneticMap.
Interpolation spline(s).
Default spline fill value.
Spline kind.
Variant chromosome group label.
Variant chromosome group length.
Variant chromosome group names.
Variant chromosome group stop indices.
Variant chromosome group start indices.
Variant genetic position.
Variant physical position.
- abstract build_spline(kind, fill_value, **kwargs)[source]#
Build a spline for estimating genetic map distances.
- Parameters:
kind (str, default = 'linear') – Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
fill_value (array-like, {'extrapolate'}, default = 'extrapolate') – If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- abstract congruence()[source]#
Assess physical and genetic map site congruency. If the genetic map is not grouped, it will be grouped.
- Returns:
out – A boolean matrix of map concordancies where:
True
= the current marker has a map_pos >= the previous positionFalse
= the current marker has a map_pos < the previous position
- Return type:
numpy.ndarray
Notes
This assumes high contiguity between physical and genetic maps (i.e. a high quality reference genome). This assumption may cause major issues if there are incorrect markers at the beginning of the chromosome. This also assumes the first marker on the chromosome is placed correctly.
- abstract copy()[source]#
Make a shallow copy of the GeneticMap.
- Returns:
out – A shallow copy of the original GeneticMap.
- Return type:
- abstract deepcopy(memo)[source]#
Make a deep copy of the GeneticMap.
- Parameters:
memo (dict) – Dictionary of memo metadata.
- Returns:
out – A deep copy of the original GeneticMap.
- Return type:
- abstract classmethod from_csv(filename, **kwargs)[source]#
Read a GeneticMap from a CSV file.
- Parameters:
filename (str) – CSV file name from which to read.
kwargs (dict) – Additional keyword arguments to use for dictating importing from a CSV.
- Returns:
out – A GeneticMap read from a CSV file.
- Return type:
- abstract classmethod from_pandas(df, **kwargs)[source]#
Read an object from a pandas.DataFrame.
- Parameters:
df (pandas.DataFrame) – Pandas dataframe from which to read.
kwargs (dict) – Additional keyword arguments to use for dictating importing from a pandas.DataFrame.
- Returns:
out – A GeneticMap read from a pandas.DataFrame.
- Return type:
- abstract gdist1g(vrnt_chrgrp, vrnt_genpos, ast, asp)[source]#
Calculate sequential genetic distances using genetic positions. Requires
vrnt_chrgrp
andvrnt_genpos
to have been sorted jointly in ascending order.- Parameters:
vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with
vrnt_genpos
.vrnt_genpos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with
vrnt_chrgrp
.ast (Integral, None) – Optional array start index (inclusive). If
None
, assume that all array elements are to be used for sequential genetic distance calculations.asp (Integral, None) – Optional array stop index (exclusive). If
None
, assume that all array elements are to be used for sequential genetic distance calculations.
- Returns:
out – A 1D array of distances between the marker prior.
- Return type:
numpy.ndarray
Notes
Sequential distance arrays will start every chromosome with numpy.inf!
- abstract gdist1p(vrnt_chrgrp, vrnt_phypos, ast, asp)[source]#
Calculate sequential genetic distances using physical positions. Requires
vrnt_chrgrp
andvrnt_phypos
to have been sorted jointly in ascending order. Requires an interpolation spline to have been built beforehand.- Parameters:
vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with
vrnt_phypos
.vrnt_phypos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with
vrnt_chrgrp
.ast (Integral, None) – Optional array start index (inclusive). If
None
, assume that all array elements are to be used for sequential genetic distance calculations.asp (Integral, None) – Optional array stop index (exclusive). If
None
, assume that all array elements are to be used for sequential genetic distance calculations.
- Returns:
out – A 1D array of distances between the marker prior.
- Return type:
numpy.ndarray
Notes
Sequential distance arrays will start every chromosome with numpy.inf!
- abstract gdist2g(vrnt_chrgrp, vrnt_genpos, rst, rsp, cst, csp)[source]#
Calculate pairwise genetic distances using genetic positions. Requires
vrnt_chrgrp
andvrnt_genpos
to have been sorted jointly in ascending order.- Parameters:
vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with
vrnt_genpos
.vrnt_genpos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with
vrnt_chrgrp
.rst (Integral, None) – Optional row start index (inclusive). If
None
, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.rsp (Integral, None) – Optional row stop index (exclusive). If
None
, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.cst (Integral, None) – Optional column start index (inclusive). If
None
, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.csp (Integral, None) – Optional column stop index (exclusive). If
None
, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.
- Returns:
out – A 2D array of distances between marker pairs.
- Return type:
numpy.ndarray
- abstract gdist2p(vrnt_chrgrp, vrnt_phypos, rst, rsp, cst, csp)[source]#
Calculate pairwise genetic distances using physical positions. Requires
vrnt_chrgrp
andvrnt_phypos
to have been sorted jointly in ascending order.- Parameters:
vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with
vrnt_phypos
.vrnt_phypos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with
vrnt_chrgrp
.rst (Integral, None) – Optional row start index (inclusive). If
None
, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.rsp (Integral, None) – Optional row stop index (exclusive). If
None
, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.cst (Integral, None) – Optional column start index (inclusive). If
None
, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.csp (Integral, None) – Optional column stop index (exclusive). If
None
, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.
- Returns:
out – A 2D array of distances between marker pairs.
- Return type:
numpy.ndarray
- abstract group(**kwargs)[source]#
Sort the GeneticMap jointly by chromosome group and physical position, then populate grouping indices.
- Parameters:
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- abstract has_spline()[source]#
Return whether or not the GeneticMap has a built spline.
- Returns:
out – Whether the GeneticMap has a spline built.
- Return type:
bool
- abstract interp_genpos(vrnt_chrgrp, vrnt_phypos)[source]#
Interpolate genetic positions given variant physical positions.
- Parameters:
vrnt_chrgrp (numpy.ndarray) – Chromosome/linkage group labels for each marker variant.
vrnt_phypos (numpy.ndarray) – Chromosome/linkage group physical positions for each marker variant.
- Returns:
out – Interpolated genetic positions for each marker variant.
- Return type:
numpy.ndarray
- abstract interp_gmap(vrnt_chrgrp, vrnt_phypos, **kwargs)[source]#
Interpolate a new genetic map from the current genetic map. Associate spline of current GeneticMap with new GeneticMap.
- Parameters:
vrnt_chrgrp (numpy.ndarray) – Chromosome/linkage group labels for each marker variant.
vrnt_phypos (numpy.ndarray) – Chromosome/linkage group physical positions for each marker variant.
kwargs (dict) – Additional keyword arguments.
- Returns:
out – An interpolated genetic map sharing a copy of the spline from the original genetic map.
- Return type:
- abstract is_congruent()[source]#
Determine if all sites in the genetic map demonstrate congruence with their supposed physical and genetic positions.
- Returns:
out – Whether all genetic map loci demonstrate congruence between physical and genetic positions.
- Return type:
bool
- abstract is_grouped(**kwargs)[source]#
Determine whether the GeneticMap has been sorted and grouped.
- Parameters:
kwargs (dict) – Additional keyword arguments.
- Returns:
grouped – True or False indicating whether the Matrix has been sorted and grouped.
- Return type:
bool
- abstract lexsort(keys, **kwargs)[source]#
Perform an indirect stable sort using a sequence of keys.
- Parameters:
keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.
kwargs (dict) – Additional keyword arguments.
- Returns:
indices – Array of indices that sort the keys along the specified axis.
- Return type:
A (N,) ndarray of ints
- abstract property nvrnt: Integral#
Number of variants in the GeneticMap.
- abstract remove(indices, **kwargs)[source]#
Remove indices from the GeneticMap. If the GeneticMap was grouped beforehand, then re-sort and re-group internal arrays after removing indices.
- Parameters:
indices (numpy.ndarray, slice, int) –
Array of shape
(a,)
,slice
orint
of item(s) to remove.Where:
a
is the number of indices to remove.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- abstract remove_discrepancies()[source]#
Remove discrepancies between the physical map and the genetic map. In instances of conflict, assume that the physical map is correct.
- Return type:
None
Notes
This assumption may cause major issues if there are incorrect markers at the beginning of the chromosome.
- abstract reorder(indices, **kwargs)[source]#
Reorder markers in-place in the GeneticMap using an array of indices.
- Parameters:
indices (A (N,) ndarray of ints) – Array of indices that reorder the matrix along the specified axis.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- abstract select(indices, **kwargs)[source]#
Keep only selected markers, removing all others from the GeneticMap. If the GeneticMap was grouped beforehand, then re-sort and re-group internal arrays after removing indices.
- Parameters:
indices (numpy.ndarray, slice, int) –
Array of shape
(a,)
,slice
orint
of item(s) to remove.Where:
a
is the number of indices to remove.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- abstract sort(keys, **kwargs)[source]#
Sort slements of the GeneticMap using a sequence of keys. Note this modifies the GeneticMap in-place.
- Parameters:
keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- abstract property spline: object#
Interpolation spline(s).
- abstract property spline_fill_value: object#
Default spline fill value.
- abstract property spline_kind: object#
Spline kind.
- abstract to_csv(filename, **kwargs)[source]#
Write a GeneticMap to a CSV file.
- Parameters:
filename (str) – CSV file name to which to write.
kwargs (dict) – Additional keyword arguments to use for dictating export to a CSV.
- Return type:
None
- abstract to_pandas(**kwargs)[source]#
Export a GeneticMap object to a pandas.DataFrame.
- Parameters:
kwargs (dict) – Additional keyword arguments to use for dictating export to a pandas.DataFrame.
- Returns:
out – An output dataframe.
- Return type:
pandas.DataFrame
- abstract ungroup(**kwargs)[source]#
Remove grouping metadata from the GeneticMap.
- Parameters:
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- abstract property vrnt_chrgrp: ndarray#
Variant chromosome group label.
- abstract property vrnt_chrgrp_len: ndarray#
Variant chromosome group length.
- abstract property vrnt_chrgrp_name: ndarray#
Variant chromosome group names.
- abstract property vrnt_chrgrp_spix: ndarray#
Variant chromosome group stop indices.
- abstract property vrnt_chrgrp_stix: ndarray#
Variant chromosome group start indices.
- abstract property vrnt_genpos: ndarray#
Variant genetic position.
- abstract property vrnt_phypos: ndarray#
Variant physical position.