ExtendedGeneticMap#

class pybrops.popgen.gmap.ExtendedGeneticMap.ExtendedGeneticMap(vrnt_chrgrp, vrnt_phypos, vrnt_stop, vrnt_genpos, vrnt_name=None, vrnt_fncode=None, spline=None, spline_kind='linear', spline_fill_value='extrapolate', vrnt_genpos_units='M', auto_group=True, auto_build_spline=True, **kwargs)[source]#

Bases: GeneticMap

A concrete class for representing an extended genetic map format.

The purpose of this concrete class is to implement functionality for:

Extended genetic map representation.
Extended genetic map metadata.
Extended genetic map routines.
Extended genetic map interpolation spline construction.
Extended genetic map spline interpolation.
Import and export of extended genetic maps.

Constructor for creating an extended genetic map object.

Parameters:

vrnt_chrgrp (numpy.ndarray) – Chromosome or linkage group assignment array of shape (n,) where n is the number of markers.
vrnt_phypos (numpy.ndarray) – Physical positions array of shape (n,) where n is the number of markers. This array contains the physical positions on the chromosome or linkage group for each marker.
vrnt_stop (numpy.ndarray) – Physical positions array of shape (n,) where n is the number of markers. This array contains the physical positions on the chromosome or linkage group where the marker stops. Useful for markers which are longer than 1 nucleotide.
vrnt_genpos (numpy.ndarray) – Genetic positions array of shape (n,) where n is the number of markers. This array contains the genetic positions on the chromosome or linkage group for each marker.
vrnt_name (numpy.ndarray, None, default = None) – Array of shape (n,) where n is the number of markers. This array contains the names of each of the marker variants.
vrnt_fncode (numpy.ndarray, None, default = None) – Array of shape (n,) where n is the number of markers. This array contains codes for the mapping function used to position the marker on the genetic map.
spline (dict, None, default = None) – Pre-built interpolation spline to associate with the genetic map.
spline_kind (str, default = "linear") – In automatic building of splines, the spline kind to be built. Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
spline_fill_value (str, numpy.ndarray, default = "extrapolate") – In automatic building of splines, the spline fill value to use. If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
vrnt_genpos_units (str, default = "M") –
Units in which genetic positions in the vrnt_genpos array are stored. Options are listed below and are case-sensitive:
- "M" - genetic position units are in Morgans
- "Morgans" - genetic position units are in Morgans
- "cM" - genetic position units are in centiMorgans
- "centiMorgans" - genetic position units are in centiMorgans
Internally, all genetic positions are stored in Morgans. Providing the units of the input
auto_group (bool) – Whether to automatically sort and group variants into chromosome groups.
auto_build_spline (bool) – Whether to automatically construct a spline on object construction. If spline is provided, then this spline is overwritten.
kwargs (dict) – Additional keyword arguments.

Methods

`build_spline`	Build a spline for estimating genetic map distances.
`congruence`	Assess physical and genetic map site congruency.
`copy`	Make a shallow copy of the ExtendedGeneticMap.
`deepcopy`	Make a deep copy of the ExtendedGeneticMap.
`from_csv`	Create an ExtendedGeneticMap object from a csv or delimited file.
`from_egmap`	Read an extended genetic map file (.egmap).
`from_pandas`	Read a ExtendedGeneticMap from a pandas.DataFrame.
`gdist1g`	Calculate sequential genetic distances using genetic positions.
`gdist1p`	Calculate sequential genetic distances using physical positions.
`gdist2g`	Calculate pairwise genetic distances using genetic positions.
`gdist2p`	Calculate pairwise genetic distances using physical positions.
`group`	Sort the GeneticMap jointly by chromosome group and physical position, then populate grouping indices.
`has_spline`	Return whether or not the GeneticMap has a built spline.
`interp_genpos`	Interpolate genetic positions given variant physical positions.
`interp_gmap`	Interpolate a new genetic map from the current genetic map.
`is_congruent`	Determine if all sites in the genetic map demonstrate congruence with their supposed physical and genetic positions.
`is_grouped`	Determine whether the GeneticMap has been sorted and grouped.
`lexsort`	Perform an indirect stable sort using a sequence of keys.
`prune`	Prune markers evenly across all chromosomes.
`remove`	Remove indices from the GeneticMap.
`remove_discrepancies`	Remove discrepancies between the physical map and the genetic map.
`reorder`	Reorder markers in the GeneticMap using an array of indices.
`select`	Keep only selected markers, removing all others from the GeneticMap.
`sort`	Sort slements of the GeneticMap using a sequence of keys.
`to_csv`	Write an ExtendedGeneticMap to a CSV file.
`to_egmap`	Write an ExtendedGeneticMap to an extended genetic map (.egmap) file.
`to_pandas`	Export a GeneticMap to a pandas.DataFrame.
`ungroup`	Remove grouping metadata from the GeneticMap.

Attributes

`nvrnt`	Number of variants in the GeneticMap.
`spline`	Interpolation spline(s).
`spline_fill_value`	Default spline fill value.
`spline_kind`	Spline kind.
`vrnt_chrgrp`	Variant chromosome group label.
`vrnt_chrgrp_len`	Variant chromosome group length.
`vrnt_chrgrp_name`	Variant chromosome group names.
`vrnt_chrgrp_spix`	Variant chromosome group stop indices.
`vrnt_chrgrp_stix`	Variant chromosome group start indices.
`vrnt_fncode`	Variant function codes.
`vrnt_genpos`	Variant genetic position in Morgans.
`vrnt_name`	Variant names.
`vrnt_phypos`	Variant physical position.
`vrnt_stop`	Variant physical position stop position.

build_spline(kind='linear', fill_value='extrapolate', **kwargs)[source]#

Build a spline for estimating genetic map distances.

Parameters:

kind (str, default = 'linear') – Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
fill_value (array-like, {'extrapolate'}, default = 'extrapolate') – If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
kwargs (dict) – Additional keyword arguments.

Return type:

None

congruence()[source]#

Assess physical and genetic map site congruency. If the genetic map is not grouped, it will be grouped.

Returns:

out – A boolean matrix of map concordancies where:

True = the current marker has a map_pos >= the previous position
False = the current marker has a map_pos < the previous position

Return type:

numpy.ndarray

Notes

This assumes high contiguity between physical and genetic maps (i.e. a high quality reference genome). This assumption may cause major issues if there are incorrect markers at the beginning of the chromosome. This also assumes the first marker on the chromosome is placed correctly.

copy()[source]#

Make a shallow copy of the ExtendedGeneticMap.

Returns:: out – A shallow copy of the original ExtendedGeneticMap.
Return type:: ExtendedGeneticMap

deepcopy(memo=None)[source]#

Make a deep copy of the ExtendedGeneticMap.

Parameters:: memo (dict) – Dictionary of memo metadata.
Returns:: out – A deep copy of the original ExtendedGeneticMap.
Return type:: ExtendedGeneticMap

classmethod from_csv(filename, sep=',', header=0, vrnt_chrgrp_col='chr', vrnt_phypos_col='pos', vrnt_stop_col='stop', vrnt_genpos_col='cM', vrnt_name_col=None, vrnt_fncode_col=None, spline=None, spline_kind='linear', spline_fill_value='extrapolate', vrnt_genpos_units='M', auto_group=True, auto_build_spline=True, **kwargs)[source]#

Create an ExtendedGeneticMap object from a csv or delimited file.

Parameters:

filename (str, path object, or file-like object) – Any valid string path, including URLs. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected (see pandas docs).
sep (str, default = ',') – CSV delimiter to use.
header (int, list of int, default=0) – Row number(s) to use as the column names, and the start of the data.
vrnt_chrgrp_col (str, Integral, default = "chr") – Name or number of the chromosome/linkage group name column from which to import.
vrnt_phypos_col (str, Integral, default = "pos") – Name or number of the physical position column from which to import.
vrnt_stop_col (str, Integral, default = "stop") – Name or number of the physical position stop column from which to import.
vrnt_genpos_col (str, Integral, default = "cM") – Name or number of the genetic position column from which to import.
vrnt_name_col (str, Integral, None, default = None) – Name or number of the marker variant name column from which to import. If None, do not import any marker variant names.
vrnt_fncode_col (str, Integral, None, default = None) – Name or number of the marker variant function code column from which to import. If None, do not import any marker variant function codes.
spline (dict, None, default = None) – Pre-built interpolation spline to associate with the genetic map.
spline_kind (str, default = "linear") – In automatic building of splines, the spline kind to be built. Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
spline_fill_value (str, numpy.ndarray, default = "extrapolate") – In automatic building of splines, the spline fill value to use. If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
vrnt_genpos_units (str, default = "M") –
Units in which genetic positions in the vrnt_genpos array are stored. Options are listed below and are case-sensitive:
- "M" - genetic position units are in Morgans
- "Morgans" - genetic position units are in Morgans
- "cM" - genetic position units are in centiMorgans
- "centiMorgans" - genetic position units are in centiMorgans
Internally, all genetic positions are stored in Morgans. Providing the units of the input
auto_group (bool) – Whether to automatically sort and group variants into chromosome groups.
auto_build_spline (bool) – Whether to automatically construct a spline on object construction. If spline is provided, then this spline is overwritten.
kwargs (dict) – Additional keyword arguments to use for dictating importing from a pandas.DataFrame.

Returns:

out – An ExtendedGeneticMap object containing all data required for genetic inferences.

Return type:

ExtendedGeneticMap

classmethod from_egmap(filename, spline=None, spline_kind='linear', spline_fill_value='extrapolate', auto_group=True, auto_build_spline=True)[source]#

Read an extended genetic map file (.egmap).

Parameters:

filename (str, path object, or file-like object) – Any valid string path, including URLs. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected (see pandas docs).
spline (dict, None, default = None) – Pre-built interpolation spline to associate with the genetic map.
spline_kind (str, default = "linear") – In automatic building of splines, the spline kind to be built. Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
spline_fill_value (str, numpy.ndarray, default = "extrapolate") – In automatic building of splines, the spline fill value to use. If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
auto_group (bool) – Whether to automatically sort and group variants into chromosome groups.
auto_build_spline (bool) – Whether to automatically construct a spline on object construction. If spline is provided, then this spline is overwritten.

Returns:

out – An ExtendedGeneticMap object containing all data required for genetic inferences.

Return type:

ExtendedGeneticMap

Notes

Extended genetic map file (.egmap) format (similar to BED file format).

Genetic map assumptions:

This file format assumes that we have a high quality genome assembly with near complete chromosome pseudomolecules and low number of errors.
This file format also assumes that we have a high quality genetic map with minimal inversions and mis-assemblies.
The number of linkage groups in the genetic map should equal the number of whole chromosomes in our genome assembly.
For discrepancies in the physical map vs. the genetic map, we assume that the physical map is correct.

General Extended Genetic Map File (.egmap) specifications:

This file format is headerless.
This file format is tab delimited.

Extended Genetic Map File (.egmap) field specifications:

chrom (REQUIRED)
Name of the chromosome; equivalent to the linkage group. This is of type ‘str’.
chr_start (REQUIRED)
The start position of the feature on the chromosome or scaffold. This is 1-indexed (e.g. the first base of a chromosome is 1) and inclusive (e.g. chr_start <= sequence <= chr_stop). This is an integer type.
chr_stop (REQUIRED)
The stop position of the feature on the chromosome or scaffold. This is 1-indexed (e.g. the first base of a chromosome is 1) and inclusive (e.g. chr_start <= sequence <= chr_stop). This is an integer type.
map_pos (REQUIRED)
The genetic map position in Morgans. (NOT centiMorgans!) This is an floating type.
mkr_name (optional)
The name of the marker on the genetic map. This is of type ‘str’.
map_fncode (optional)
The mapping function code used to create this gentic map. This is of type ‘str’.

Mapping function codes:
- Haldane: ‘H’
- Kosambi: ‘K’
- Unknown: ‘U’
- Custom: <str of any length>

classmethod from_pandas(df, vrnt_chrgrp_col='chr', vrnt_phypos_col='pos', vrnt_stop_col='stop', vrnt_genpos_col='cM', vrnt_name_col=None, vrnt_fncode_col=None, spline=None, spline_kind='linear', spline_fill_value='extrapolate', vrnt_genpos_units='M', auto_group=True, auto_build_spline=True, **kwargs)[source]#

Read a ExtendedGeneticMap from a pandas.DataFrame.

Parameters:

df (pandas.DataFrame) – Pandas dataframe from which to read.
vrnt_chrgrp_col (str, Integral, default = "chr") – Name or number of the chromosome/linkage group name column from which to import.
vrnt_phypos_col (str, Integral, default = "pos") – Name or number of the physical position column from which to import.
vrnt_stop_col (str, Integral, default = "stop") – Name or number of the physical position stop column from which to import.
vrnt_genpos_col (str, Integral, default = "cM") – Name or number of the genetic position column from which to import.
vrnt_name_col (str, Integral, None, default = None) – Name or number of the marker variant name column from which to import. If None, do not import any marker variant names.
vrnt_fncode_col (str, Integral, None, default = None) – Name or number of the marker variant function code column from which to import. If None, do not import any marker variant function codes.
spline (dict, None, default = None) – Pre-built interpolation spline to associate with the genetic map.
spline_kind (str, default = "linear") – In automatic building of splines, the spline kind to be built. Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use.
spline_fill_value (str, numpy.ndarray, default = "extrapolate") – In automatic building of splines, the spline fill value to use. If ‘extrapolate’, then points outside the data range will be extrapolated. If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
vrnt_genpos_units (str, default = "M") –
Units in which genetic positions in the vrnt_genpos array are stored. Options are listed below and are case-sensitive:
- "M" - genetic position units are in Morgans
- "Morgans" - genetic position units are in Morgans
- "cM" - genetic position units are in centiMorgans
- "centiMorgans" - genetic position units are in centiMorgans
Internally, all genetic positions are stored in Morgans. Providing the units of the input
auto_group (bool) – Whether to automatically sort and group variants into chromosome groups.
auto_build_spline (bool) – Whether to automatically construct a spline on object construction. If spline is provided, then this spline is overwritten.
kwargs (dict) – Additional keyword arguments to use for dictating importing from a pandas.DataFrame.

Returns:

out – A ExtendedGeneticMap read from a pandas.DataFrame.

Return type:

ExtendedGeneticMap

Notes

Genetic map assumptions:

This assumes that we have a high quality genome assembly with near complete chromosome pseudomolecules and low number of errors.
This also assumes that we have a high quality genetic map with minimal inversions and mis-assemblies.
The number of linkage groups in the genetic map should equal the number of whole chromosomes in our genome assembly.
For discrepancies in the physical map vs. the genetic map, we assume that the physical map is correct.

Pandas DataFrame field specifications:

vrnt_chrgrp (REQUIRED)
Name of the chromosome; equivalent to the linkage group. This is of type ‘str’.
vrnt_phypos (REQUIRED)
The start position of the feature on the chromosome or scaffold. This is 1-indexed (e.g. the first base of a chromosome is 1) and inclusive (e.g. chr_start <= sequence <= chr_stop). This is an integer type.
vrnt_stop (REQUIRED)
The stop position of the feature on the chromosome or scaffold. This is 1-indexed (e.g. the first base of a chromosome is 1) and inclusive (e.g. chr_start <= sequence <= chr_stop). This is an integer type.
vrnt_genpos (REQUIRED)
The genetic map position in Morgans. (NOT centiMorgans!) This is an floating type.
vrnt_name (optional)
The name of the marker on the genetic map. This is of type ‘str’.
vrnt_fncode (optional)
The mapping function code used to create this gentic map. This is of type ‘str’.

Mapping function codes:
- Haldane: ‘H’
- Kosambi: ‘K’
- Unknown: ‘U’
- Custom: <str of any length>

gdist1g(vrnt_chrgrp, vrnt_genpos, ast=None, asp=None)[source]#

Calculate sequential genetic distances using genetic positions. Requires vrnt_chrgrp and vrnt_genpos to have been sorted jointly in ascending order.

Parameters:

vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with vrnt_genpos.
vrnt_genpos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with vrnt_chrgrp.
ast (Integral, None) – Optional array start index (inclusive). If None, assume that all array elements are to be used for sequential genetic distance calculations.
asp (Integral, None) – Optional array stop index (exclusive). If None, assume that all array elements are to be used for sequential genetic distance calculations.

Returns:

out – A 1D array of distances between the marker prior.

Return type:

numpy.ndarray

Notes

Sequential distance arrays will start every chromosome with numpy.inf!

gdist1p(vrnt_chrgrp, vrnt_phypos, ast=None, asp=None)[source]#

Calculate sequential genetic distances using physical positions. Requires vrnt_chrgrp and vrnt_phypos to have been sorted jointly in ascending order. Requires an interpolation spline to have been built beforehand.

Parameters:

vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with vrnt_phypos.
vrnt_phypos (numpy.ndarray) – A 1D array of variant physical positions. Must be sorted in ascending order jointly with vrnt_chrgrp.
ast (Integral, None) – Optional array start index (inclusive). If None, assume that all array elements are to be used for sequential genetic distance calculations.
asp (Integral, None) – Optional array stop index (exclusive). If None, assume that all array elements are to be used for sequential genetic distance calculations.

Returns:

out – A 1D array of distances between the marker prior.

Return type:

numpy.ndarray

Notes

Sequential distance arrays will start every chromosome with numpy.inf!

gdist2g(vrnt_chrgrp, vrnt_genpos, rst=None, rsp=None, cst=None, csp=None)[source]#

Calculate pairwise genetic distances using genetic positions. Requires vrnt_chrgrp and vrnt_genpos to have been sorted jointly in ascending order.

Parameters:

vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with vrnt_genpos.
vrnt_genpos (numpy.ndarray) – A 1D array of variant genetic positions. Must be sorted in ascending order jointly with vrnt_chrgrp.
rst (Integral, None) – Optional row start index (inclusive). If None, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.
rsp (Integral, None) – Optional row stop index (exclusive). If None, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.
cst (Integral, None) – Optional column start index (inclusive). If None, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.
csp (Integral, None) – Optional column stop index (exclusive). If None, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.

Returns:

out – A 2D array of distances between marker pairs.

Return type:

numpy.ndarray

gdist2p(vrnt_chrgrp, vrnt_phypos, rst=None, rsp=None, cst=None, csp=None)[source]#

Calculate pairwise genetic distances using physical positions. Requires vrnt_chrgrp and vrnt_phypos to have been sorted jointly in ascending order.

Parameters:

vrnt_chrgrp (numpy.ndarray) – A 1D array of variant chromosome groups. Must be sorted in ascending order jointly with vrnt_genpos.
vrnt_phypos (numpy.ndarray) – A 1D array of variant physical positions. Must be sorted in ascending order jointly with vrnt_chrgrp.
rst (Integral, None) – Optional row start index (inclusive). If None, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.
rsp (Integral, None) – Optional row stop index (exclusive). If None, assume that all rows are to be calculated in the pairwise genetic distance matrix are to be calculated.
cst (Integral, None) – Optional column start index (inclusive). If None, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.
csp (Integral, None) – Optional column stop index (exclusive). If None, assume that all columns are to be calculated in the pairwise genetic distance matrix are to be calculated.

Returns:

out – A 2D array of distances between marker pairs.

Return type:

numpy.ndarray

group(**kwargs)[source]#

Sort the GeneticMap jointly by chromosome group and physical position, then populate grouping indices.

Parameters:: kwargs (dict) – Additional keyword arguments.
Return type:: None

has_spline()[source]#

Return whether or not the GeneticMap has a built spline.

Returns:: out – Whether the GeneticMap has a spline built.
Return type:: bool

interp_genpos(vrnt_chrgrp, vrnt_phypos)[source]#

Interpolate genetic positions given variant physical positions.

Parameters:

vrnt_chrgrp (numpy.ndarray) – Chromosome/linkage group labels for each marker variant.
vrnt_phypos (numpy.ndarray) – Chromosome/linkage group physical positions for each marker variant.

Returns:

out – Interpolated genetic positions for each marker variant.

Return type:

numpy.ndarray

interp_gmap(vrnt_chrgrp, vrnt_phypos, vrnt_stop, vrnt_name=None, vrnt_fncode=None, **kwargs)[source]#

Interpolate a new genetic map from the current genetic map. Associate spline of current GeneticMap with new GeneticMap.

Parameters:

vrnt_chrgrp (numpy.ndarray) – Chromosome/linkage group labels for each marker variant.
vrnt_phypos (numpy.ndarray) – Chromosome/linkage group physical positions for each marker variant.
vrnt_stop (numpy.ndarray) – Physical positions for the end of a marker variant
vrnt_name (numpy.ndarray, None, default = None) – Marker variant names.
vrnt_fncode (numpy.ndarray, None, default = None) – Marker variant mapping function codes.
kwargs (dict) – Additional keyword arguments.

Returns:

out – An interpolated genetic map sharing a copy of the spline from the original genetic map.

Return type:

ExtendedGeneticMap

is_congruent()[source]#

Determine if all sites in the genetic map demonstrate congruence with their supposed physical and genetic positions.

Returns:: out – Whether all genetic map loci demonstrate congruence between physical and genetic positions.
Return type:: bool

is_grouped()[source]#

Determine whether the GeneticMap has been sorted and grouped.

Returns:: grouped – True or False indicating whether the GeneticMap has been sorted and grouped.
Return type:: bool

lexsort(keys=None, **kwargs)[source]#

Perform an indirect stable sort using a sequence of keys.

Parameters:

keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.
kwargs (dict) – Additional keyword arguments.

Returns:

indices – Array of indices that sort the keys along the specified axis.

Return type:

A (N,) ndarray of ints

property nvrnt: Integral#: Number of variants in the GeneticMap.

prune(nt=None, M=None)[source]#

Prune markers evenly across all chromosomes.

Parameters:

nt (int) – Target distance between each selected marker in nucleotides.
M (float) – Target distance between each selected marker in Morgans. If this option is specified, selection based on Morgans takes first priority. If the physical distance between two markers selected based on their genetic distance exceeds ‘nt’ (if provided), the additional markers are sought between those regions.
kwargs (dict) – Additional keyword arguments.

Return type:

None

remove(indices, **kwargs)[source]#

Remove indices from the GeneticMap. If the GeneticMap was grouped beforehand, then re-sort and re-group internal arrays after removing indices.

Parameters:

indices (int, slice, Sequence) –
Array of shape (a,), slice or int of item(s) to remove.

Where:
- a is the number of indices to remove.
kwargs (dict) – Additional keyword arguments.

Return type:

None

remove_discrepancies()[source]#

Remove discrepancies between the physical map and the genetic map. In instances of conflict, assume that the physical map is correct.

Return type:: None

Notes

This assumption may cause major issues if there are incorrect markers at the beginning of the chromosome.

reorder(indices)[source]#

Reorder markers in the GeneticMap using an array of indices. Note this modifies the GeneticMap in-place.

Parameters:

indices (A (N,) ndarray of ints) – Array of indices that reorder the matrix along the specified axis.
kwargs (dict) – Additional keyword arguments.

Return type:

None

select(indices, **kwargs)[source]#

Keep only selected markers, removing all others from the GeneticMap. If the GeneticMap was grouped beforehand, then re-sort and re-group internal arrays after removing indices.

Parameters:

indices (int, slice, Sequence) –
Array of shape (a,), slice or int of item(s) to remove.

Where:
- a is the number of indices to remove.
kwargs (dict) – Additional keyword arguments.

Return type:

None

sort(keys=None)[source]#

Sort slements of the GeneticMap using a sequence of keys. Note this modifies the GeneticMap in-place.

Parameters:

keys (A (k, N) array or tuple containing k (N,)-shaped sequences) – The k different columns to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.
kwargs (dict) – Additional keyword arguments.

Return type:

None

property spline: dict | None#: Interpolation spline(s).

property spline_fill_value: str | ndarray | None#: Default spline fill value.

property spline_kind: str | None#: Spline kind.

to_csv(filename, vrnt_chrgrp_col='chr', vrnt_phypos_col='pos', vrnt_stop_col='stop', vrnt_genpos_col='cM', vrnt_name_col='name', vrnt_fncode_col='fncode', vrnt_genpos_units='cM', sep=',', header=True, index=False, **kwargs)[source]#

Write an ExtendedGeneticMap to a CSV file.

Parameters:

filename (str) – CSV file name to which to write.
vrnt_chrgrp_col (str, default = "chr") – Name of the chromosome/linkage group name column to which to export.
vrnt_phypos_col (str, default = "pos") – Name of the physical position column to which to export.
vrnt_stop_col (str, default = "stop") – Name of the physical position stop column to which to export.
vrnt_genpos_col (str, default = "cM") – Name of the genetic position column to which to export.
vrnt_name_col (str, default = "name") – Name of the marker variant name column to which to export.
vrnt_fncode_col (str, default = "fncode") – Name of the marker variant function code column to which to export.
vrnt_genpos_units (str, default = "cM") –
Units of the genetic position column to which to export. Options are listed below and are case-sensitive:
- "M" - genetic position units are in Morgans
- "Morgans" - genetic position units are in Morgans
- "cM" - genetic position units are in centiMorgans
- "centiMorgans" - genetic position units are in centiMorgans
sep (str, default = ",") – Separator to use in the exported CSV file.
header (bool, default = True) – Whether to save header names.
index (bool, default = False) – Whether to save a row index in the exported CSV file.
kwargs (dict) – Additional keyword arguments to use for dictating export to a CSV.

Return type:

None

to_egmap(filename)[source]#

Write an ExtendedGeneticMap to an extended genetic map (.egmap) file.

Parameters:: filename (str) – .egmap file name to which to write.
Return type:: None

to_pandas(vrnt_chrgrp_col='chr', vrnt_phypos_col='pos', vrnt_stop_col='stop', vrnt_genpos_col='cM', vrnt_name_col='name', vrnt_fncode_col='fncode', vrnt_genpos_units='cM', **kwargs)[source]#

Export a GeneticMap to a pandas.DataFrame.

Parameters:

vrnt_chrgrp_col (str, default = "chr") – Name of the chromosome/linkage group name column to which to export.
vrnt_phypos_col (str, default = "pos") – Name of the physical position column to which to export.
vrnt_stop_col (str, default = "stop") – Name of the physical position stop column to which to export.
vrnt_genpos_col (str, default = "cM") – Name of the genetic position column to which to export.
vrnt_name_col (str, default = "name") – Name of the marker variant name column to which to export.
vrnt_fncode_col (str, default = "fncode") – Name of the marker variant function code column to which to export.
vrnt_genpos_units (str, default = "cM") –
Units of the genetic position column to which to export. Options are listed below and are case-sensitive:
- "M" - genetic position units are in Morgans
- "Morgans" - genetic position units are in Morgans
- "cM" - genetic position units are in centiMorgans
- "centiMorgans" - genetic position units are in centiMorgans
kwargs (dict) – Additional keyword arguments to use for dictating export to a pandas.DataFrame.

Returns:

out – An output dataframe.

Return type:

pandas.DataFrame

ungroup(**kwargs)[source]#

Remove grouping metadata from the GeneticMap.

Parameters:: kwargs (dict) – Additional keyword arguments.
Return type:: None

property vrnt_chrgrp: ndarray#: Variant chromosome group label.

property vrnt_chrgrp_len: ndarray | None#: Variant chromosome group length.

property vrnt_chrgrp_name: ndarray | None#: Variant chromosome group names.

property vrnt_chrgrp_spix: ndarray | None#: Variant chromosome group stop indices.

property vrnt_chrgrp_stix: ndarray | None#: Variant chromosome group start indices.

property vrnt_fncode: ndarray | None#: Variant function codes.

property vrnt_genpos: ndarray#: Variant genetic position in Morgans.

property vrnt_name: ndarray | None#: Variant names.

property vrnt_phypos: ndarray#: Variant physical position.

property vrnt_stop: ndarray#: Variant physical position stop position.