StandardGeneticMap#
- class pybrops.popgen.gmap.StandardGeneticMap.StandardGeneticMap(vrnt_chrgrp, vrnt_phypos, vrnt_genpos, spline=None, spline_kind='linear', spline_fill_value='extrapolate', vrnt_genpos_units='M', auto_group=True, auto_build_spline=True, **kwargs)[source]#
Bases:
GeneticMap
A concrete class for representing a standard genetic map format.
- The purpose of this concrete class is to implement functionality for:
Genetic map representation.
Genetic map metadata.
Genetic map routines.
Genetic map interpolation spline construction.
Genetic map spline interpolation.
Import and export of genetic maps.
Constructor for creating a standard genetic map object.
- Parameters:
vrnt_chrgrp (numpy.ndarray) – Chromosome or linkage group assignment array of shape
(n,)
wheren
is the number of markers.vrnt_phypos (numpy.ndarray) – Physical positions array of shape
(n,)
wheren
is the number of markers. This array contains the physical positions on the chromosome or linkage group for each marker.vrnt_genpos (numpy.ndarray) – Genetic positions array of shape
(n,)
wheren
is the number of markers. This array contains the genetic positions on the chromosome or linkage group for each marker.spline (dict, None, default = None) – Pre-built interpolation spline to associate with the genetic map.
spline_kind (str, default = "linear") – In automatic building of splines, the spline kind to be built. Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
spline_fill_value (str, numpy.ndarray, default = "extrapolate") – In automatic building of splines, the spline fill value to use. If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
vrnt_genpos_units (str, default = "M") –
Units in which genetic positions in the
vrnt_genpos
array are stored. Options are listed below and are case-sensitive:"M"
- genetic position units are in Morgans"Morgans"
- genetic position units are in Morgans"cM"
- genetic position units are in centiMorgans"centiMorgans"
- genetic position units are in centiMorgans
Internally, all genetic positions are stored in Morgans. Providing the units of the input
auto_group (bool) – Whether to automatically sort and group variants into chromosome groups.
auto_build_spline (bool) – Whether to automatically construct a spline on object construction. If
spline
is provided, then this spline is overwritten.kwargs (dict) – Additional keyword arguments.
Methods
Build a spline for estimating genetic map distances.
Assess physical and genetic map site congruency.
Make a shallow copy of the StandardGeneticMap.
Make a deep copy of the StandardGeneticMap.
Read a StandardGeneticMap from a CSV file.
Read a StandardGeneticMap from a pandas.DataFrame.
Calculate sequential genetic distances using genetic positions.
Calculate sequential genetic distances using physical positions.
Calculate pairwise genetic distances using genetic positions.
Calculate pairwise genetic distances using physical positions.
Sort the GeneticMap jointly by chromosome group and physical position, then populate grouping indices.
Return whether or not the GeneticMap has a built spline.
Interpolate genetic positions given variant physical positions.
Interpolate a new genetic map from the current genetic map.
Determine if all sites in the genetic map demonstrate congruence with their supposed physical and genetic positions.
Determine whether the GeneticMap has been sorted and grouped.
Perform an indirect stable sort using a sequence of keys.
Remove indices from the GeneticMap.
Remove discrepancies between the physical map and the genetic map.
Reorder markers in-place in the GeneticMap using an array of indices.
Keep only selected markers, removing all others from the GeneticMap.
Sort slements of the GeneticMap using a sequence of keys.
Write a StandardGeneticMap to a CSV file.
Export a GeneticMap to a pandas.DataFrame.
Remove grouping metadata from the GeneticMap.
Attributes
Number of variants in the GeneticMap.
Interpolation spline(s).
Default spline fill value.
Spline kind.
Variant chromosome group label.
Variant chromosome group length.
Variant chromosome group names.
Variant chromosome group stop indices.
Variant chromosome group start indices.
Variant genetic position in Morgans.
Variant physical position.
- build_spline(kind='linear', fill_value='extrapolate', **kwargs)[source]#
Build a spline for estimating genetic map distances.
- Parameters:
kind (str, default = 'linear') – Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
fill_value (array-like, {'extrapolate'}, default = 'extrapolate') – If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- congruence()[source]#
Assess physical and genetic map site congruency. If the genetic map is not grouped, it will be grouped.
- Returns:
out – A boolean matrix of map concordancies where:
True
= the current marker has a map_pos >= the previous positionFalse
= the current marker has a map_pos < the previous position
- Return type:
numpy.ndarray
Notes
This assumes high contiguity between physical and genetic maps (i.e. a high quality reference genome). This assumption may cause major issues if there are incorrect markers at the beginning of the chromosome. This also assumes the first marker on the chromosome is placed correctly.
- copy()[source]#
Make a shallow copy of the StandardGeneticMap.
- Returns:
out – A shallow copy of the original StandardGeneticMap.
- Return type:
- deepcopy(memo=None)[source]#
Make a deep copy of the StandardGeneticMap.
- Parameters:
memo (dict, None) – Dictionary of memo metadata.
- Returns:
out – A deep copy of the original StandardGeneticMap.
- Return type:
- classmethod from_csv(filename, vrnt_chrgrp_col='chr', vrnt_phypos_col='pos', vrnt_genpos_col='cM', spline=None, spline_kind='linear', spline_fill_value='extrapolate', vrnt_genpos_units='M', auto_group=True, auto_build_spline=True, sep=',', header=0, **kwargs)[source]#
Read a StandardGeneticMap from a CSV file.
- Parameters:
filename (str, path object, or file-like object) – Any valid string path, including URLs. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected (see pandas docs).
vrnt_chrgrp_col (str, Integral, default = "chr") – Name or number of the chromosome/linkage group name column from which to import.
vrnt_phypos_col (str, Integral, default = "pos") – Name or number of the physical position column from which to import.
vrnt_genpos_col (str, Integral, default = "cM") – Name or number of the genetic position column from which to import.
spline (dict, None, default = None) – Pre-built interpolation spline to associate with the genetic map.
spline_kind (str, default = "linear") – In automatic building of splines, the spline kind to be built. Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
spline_fill_value (str, numpy.ndarray, default = "extrapolate") – In automatic building of splines, the spline fill value to use. If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
vrnt_genpos_units (str, default = "M") –
Units in which genetic positions in the
vrnt_genpos
array are stored. Options are listed below and are case-sensitive:"M"
- genetic position units are in Morgans"Morgans"
- genetic position units are in Morgans"cM"
- genetic position units are in centiMorgans"centiMorgans"
- genetic position units are in centiMorgans
Internally, all genetic positions are stored in Morgans. Providing the units of the input
auto_group (bool) – Whether to automatically sort and group variants into chromosome groups.
auto_build_spline (bool) – Whether to automatically construct a spline on object construction. If
spline
is provided, then this spline is overwritten.sep (str, default = ',') – CSV delimiter to use.
header (int, list of int, default=0) – Row number(s) to use as the column names, and the start of the data.
kwargs (dict) – Additional keyword arguments to use for dictating importing from a CSV.
- Returns:
out – A StandardGeneticMap read from a CSV file.
- Return type:
- classmethod from_pandas(df, vrnt_chrgrp_col='chr', vrnt_phypos_col='pos', vrnt_genpos_col='cM', spline=None, spline_kind='linear', spline_fill_value='extrapolate', vrnt_genpos_units='M', auto_group=True, auto_build_spline=True, **kwargs)[source]#
Read a StandardGeneticMap from a pandas.DataFrame.
- Parameters:
df (pandas.DataFrame) – Pandas dataframe from which to read.
vrnt_chrgrp_col (str, Integral, default = "chr") – Name or number of the chromosome/linkage group name column from which to import.
vrnt_phypos_col (str, Integral, default = "pos") – Name or number of the physical position column from which to import.
vrnt_genpos_col (str, Integral, default = "cM") – Name or number of the genetic position column from which to import.
spline (dict, None, default = None) – Pre-built interpolation spline to associate with the genetic map.
spline_kind (str, default = "linear") – In automatic building of splines, the spline kind to be built. Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
spline_fill_value (str, numpy.ndarray, default = "extrapolate") – In automatic building of splines, the spline fill value to use. If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
vrnt_genpos_units (str, default = "M") –
Units in which genetic positions in the
vrnt_genpos
array are stored. Options are listed below and are case-sensitive:"M"
- genetic position units are in Morgans"Morgans"
- genetic position units are in Morgans"cM"
- genetic position units are in centiMorgans"centiMorgans"
- genetic position units are in centiMorgans
Internally, all genetic positions are stored in Morgans. Providing the units of the input
auto_group (bool) – Whether to automatically sort and group variants into chromosome groups.
auto_build_spline (bool) – Whether to automatically construct a spline on object construction. If
spline
is provided, then this spline is overwritten.kwargs (dict) – Additional keyword arguments to use for dictating importing from a pandas.DataFrame.
- Returns:
out – A StandardGeneticMap read from a pandas.DataFrame.
- Return type:
- gdist1g(vrnt_chrgrp, vrnt_genpos, ast=None, asp=None)[source]#
Calculate sequential genetic distances using genetic positions. Requires
vrnt_chrgrp
andvrnt_genpos
to have been sorted jointly in ascending order.- Parameters:
vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with
vrnt_genpos
.vrnt_genpos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with
vrnt_chrgrp
.ast (Integral, None) – Optional array start index (inclusive). If
None
, assume that all array elements are to be used for sequential genetic distance calculations.asp (Integral, None) – Optional array stop index (exclusive). If
None
, assume that all array elements are to be used for sequential genetic distance calculations.
- Returns:
out – A 1D array of distances between the marker prior.
- Return type:
numpy.ndarray
Notes
Sequential distance arrays will start every chromosome with numpy.inf!
- gdist1p(vrnt_chrgrp, vrnt_phypos, ast=None, asp=None)[source]#
Calculate sequential genetic distances using physical positions. Requires
vrnt_chrgrp
andvrnt_phypos
to have been sorted jointly in ascending order. Requires an interpolation spline to have been built beforehand.- Parameters:
vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with
vrnt_phypos
.vrnt_phypos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with
vrnt_chrgrp
.ast (Integral, None) – Optional array start index (inclusive). If
None
, assume that all array elements are to be used for sequential genetic distance calculations.asp (Integral, None) – Optional array stop index (exclusive). If
None
, assume that all array elements are to be used for sequential genetic distance calculations.
- Returns:
out – A 1D array of distances between the marker prior.
- Return type:
numpy.ndarray
Notes
Sequential distance arrays will start every chromosome with numpy.inf!
- gdist2g(vrnt_chrgrp, vrnt_genpos, rst=None, rsp=None, cst=None, csp=None)[source]#
Calculate pairwise genetic distances using genetic positions. Requires
vrnt_chrgrp
andvrnt_genpos
to have been sorted jointly in ascending order.- Parameters:
vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with
vrnt_genpos
.vrnt_genpos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with
vrnt_chrgrp
.rst (Integral, None) – Optional row start index (inclusive). If
None
, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.rsp (Integral, None) – Optional row stop index (exclusive). If
None
, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.cst (Integral, None) – Optional column start index (inclusive). If
None
, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.csp (Integral, None) – Optional column stop index (exclusive). If
None
, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.
- Returns:
out – A 2D array of distances between marker pairs.
- Return type:
numpy.ndarray
- gdist2p(vrnt_chrgrp, vrnt_phypos, rst=None, rsp=None, cst=None, csp=None)[source]#
Calculate pairwise genetic distances using physical positions. Requires
vrnt_chrgrp
andvrnt_phypos
to have been sorted jointly in ascending order.- Parameters:
vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with
vrnt_phypos
.vrnt_phypos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with
vrnt_chrgrp
.rst (Integral, None) – Optional row start index (inclusive). If
None
, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.rsp (Integral, None) – Optional row stop index (exclusive). If
None
, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.cst (Integral, None) – Optional column start index (inclusive). If
None
, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.csp (Integral, None) – Optional column stop index (exclusive). If
None
, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.
- Returns:
out – A 2D array of distances between marker pairs.
- Return type:
numpy.ndarray
- group(**kwargs)[source]#
Sort the GeneticMap jointly by chromosome group and physical position, then populate grouping indices.
- Parameters:
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- has_spline()[source]#
Return whether or not the GeneticMap has a built spline.
- Returns:
out – Whether the GeneticMap has a spline built.
- Return type:
bool
- interp_genpos(vrnt_chrgrp, vrnt_phypos)[source]#
Interpolate genetic positions given variant physical positions.
- Parameters:
vrnt_chrgrp (numpy.ndarray) – Chromosome/linkage group labels for each marker variant.
vrnt_phypos (numpy.ndarray) – Chromosome/linkage group physical positions for each marker variant.
- Returns:
out – Interpolated genetic positions for each marker variant.
- Return type:
numpy.ndarray
- interp_gmap(vrnt_chrgrp, vrnt_phypos, **kwargs)[source]#
Interpolate a new genetic map from the current genetic map. Associate spline of current GeneticMap with new GeneticMap.
- Parameters:
vrnt_chrgrp (numpy.ndarray) – Chromosome/linkage group labels for each marker variant.
vrnt_phypos (numpy.ndarray) – Chromosome/linkage group physical positions for each marker variant.
kwargs (dict) – Additional keyword arguments.
- Returns:
out – An interpolated genetic map sharing a copy of the spline from the original genetic map.
- Return type:
- is_congruent()[source]#
Determine if all sites in the genetic map demonstrate congruence with their supposed physical and genetic positions.
- Returns:
out – Whether all genetic map loci demonstrate congruence between physical and genetic positions.
- Return type:
bool
- is_grouped()[source]#
Determine whether the GeneticMap has been sorted and grouped.
- Returns:
grouped – True or False indicating whether the GeneticMap has been sorted and grouped.
- Return type:
bool
- lexsort(keys=None, **kwargs)[source]#
Perform an indirect stable sort using a sequence of keys.
- Parameters:
keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.
kwargs (dict) – Additional keyword arguments.
- Returns:
indices – Array of indices that sort the keys along the specified axis.
- Return type:
A (N,) ndarray of ints
- property nvrnt: Integral#
Number of variants in the GeneticMap.
- remove(indices, **kwargs)[source]#
Remove indices from the GeneticMap. If the GeneticMap was grouped beforehand, then re-sort and re-group internal arrays after removing indices.
- Parameters:
indices (int, slice, Sequence) –
Array of shape
(a,)
,slice
orint
of item(s) to remove.Where:
a
is the number of indices to remove.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- remove_discrepancies()[source]#
Remove discrepancies between the physical map and the genetic map. In instances of conflict, assume that the physical map is correct.
- Return type:
None
Notes
This assumption may cause major issues if there are incorrect markers at the beginning of the chromosome.
- reorder(indices)[source]#
Reorder markers in-place in the GeneticMap using an array of indices.
- Parameters:
indices (A (N,) ndarray of ints) – Array of indices that reorder the matrix along the specified axis.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- select(indices, **kwargs)[source]#
Keep only selected markers, removing all others from the GeneticMap. If the GeneticMap was grouped beforehand, then re-sort and re-group internal arrays after removing indices.
- Parameters:
indices (int, slice, Sequence) –
Array of shape
(a,)
,slice
orint
of item(s) to remove.Where:
a
is the number of indices to remove.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- sort(keys=None)[source]#
Sort slements of the GeneticMap using a sequence of keys. Note this modifies the GeneticMap in-place.
- Parameters:
keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- property spline: dict | None#
Interpolation spline(s).
- property spline_fill_value: str | ndarray | None#
Default spline fill value.
- property spline_kind: str | None#
Spline kind.
- to_csv(filename, vrnt_chrgrp_col='chr', vrnt_phypos_col='pos', vrnt_genpos_col='cM', vrnt_genpos_units='cM', sep=',', header=True, index=False, **kwargs)[source]#
Write a StandardGeneticMap to a CSV file.
- Parameters:
filename (str) – CSV file name to which to write.
vrnt_chrgrp_col (str, default = "chr") – Name of the chromosome/linkage group name column to which to export.
vrnt_phypos_col (str, default = "pos") – Name of the physical position column to which to export.
vrnt_genpos_col (str, default = "cM") – Name of the genetic position column to which to export.
vrnt_genpos_units (str, default = "cM") –
Units of the genetic position column to which to export. Options are listed below and are case-sensitive:
"M"
- genetic position units are in Morgans"Morgans"
- genetic position units are in Morgans"cM"
- genetic position units are in centiMorgans"centiMorgans"
- genetic position units are in centiMorgans
sep (str, default = ",") – Separator to use in the exported CSV file.
header (bool, default = True) – Whether to save header names.
index (bool, default = False) – Whether to save a row index in the exported CSV file.
kwargs (dict) – Additional keyword arguments to use for dictating export to a CSV.
- Return type:
None
- to_pandas(vrnt_chrgrp_col='chr', vrnt_phypos_col='pos', vrnt_genpos_col='cM', vrnt_genpos_units='cM', **kwargs)[source]#
Export a GeneticMap to a pandas.DataFrame.
- Parameters:
vrnt_chrgrp_col (str, default = "chr") – Name of the chromosome/linkage group name column to which to export.
vrnt_phypos_col (str, default = "pos") – Name of the physical position column to which to export.
vrnt_genpos_col (str, default = "cM") – Name of the genetic position column to which to export.
vrnt_genpos_units (str, default = "cM") –
Units of the genetic position column to which to export. Options are listed below and are case-sensitive:
"M"
- genetic position units are in Morgans"Morgans"
- genetic position units are in Morgans"cM"
- genetic position units are in centiMorgans"centiMorgans"
- genetic position units are in centiMorgans
kwargs (dict) – Additional keyword arguments to use for dictating export to a pandas.DataFrame.
- Returns:
out – An output dataframe.
- Return type:
pandas.DataFrame
- ungroup(**kwargs)[source]#
Remove grouping metadata from the GeneticMap.
- Parameters:
kwargs (dict) – Additional keyword arguments.
- Return type:
None
- property vrnt_chrgrp: ndarray#
Variant chromosome group label.
- property vrnt_chrgrp_len: ndarray | None#
Variant chromosome group length.
- property vrnt_chrgrp_name: ndarray | None#
Variant chromosome group names.
- property vrnt_chrgrp_spix: ndarray | None#
Variant chromosome group stop indices.
- property vrnt_chrgrp_stix: ndarray | None#
Variant chromosome group start indices.
- property vrnt_genpos: ndarray#
Variant genetic position in Morgans.
- property vrnt_phypos: ndarray#
Variant physical position.