Genetic Maps#
Class Family Overview#
Genetic maps are used to estimate marker-wise genetic map positions in mating simulations. PyBrOpS provides functionality for the reading and writing of genetic maps and the interpolation of genetic map positions via GeneticMap
classes. The ability to read a genetic map from a file allows for real genetic recombination data to be used in a breeding simulation, adding realism.
Summary of Genetic Map Classes#
Genetic map support for PyBrOpS is found in the pybrops.popgen.gmap
module. Contained in this module are several GeneticMap
class type definitions. These classes are summarized in the table below.
Class Name |
Class Type |
Class Description |
---|---|---|
|
Abstract |
Interface for all genetic map child classes. |
|
Concrete |
Class representing genetic maps with additional genetic map metadata. |
Loading Genetic Map Modules#
The various genetic map classes summarized above can be imported into a Python scope as demonstrated in the code example below.
# import the GeneticMap class (an abstract interface class)
from pybrops.popgen.gmap.GeneticMap import GeneticMap
# import the ExtendedGeneticMap class (a concrete class)
from pybrops.popgen.gmap.ExtendedGeneticMap import ExtendedGeneticMap
Creating Genetic Maps#
Creating genetic maps from NumPy arrays#
Genetic map objects can be constructed from raw NumPy arrays, which may be useful for creating simulated genetic maps. The code below demonstrates the construction of an ExtendedGeneticMap
from NumPy arrays.
# define number of variants
nvrnt = 100
# create random chromosome groups 1-9
chroms = list("123456789")
vrnt_chrgrp = numpy.random.choice(chroms, nvrnt, True).astype(object)
vrnt_chrgrp.sort()
# create variant physical positions in range [1, 2**28]
vrnt_phypos = numpy.random.randint(1, 2**20, nvrnt)
vrnt_phypos.sort()
# create variant genetic positions in range [0,1]
vrnt_genpos = numpy.random.random(nvrnt)
vrnt_genpos.sort()
# create variant names
vrnt_name = numpy.array(["SNP"+str(i+1).zfill(3) for i in range(nvrnt)], dtype=object)
# construct genetic map
gmap = ExtendedGeneticMap(
vrnt_chrgrp=vrnt_chrgrp,
vrnt_phypos=vrnt_phypos,
vrnt_stop=vrnt_phypos,
vrnt_genpos=vrnt_genpos,
vrnt_name=vrnt_name,
vrnt_fncode=None, # not needed
)
Reading genetic maps from a file#
GeneticMap
objects can also be read from a file. This may be useful for users who have their own, empirically determined genetic maps from an organism of their choosing. The example below illustrates the reading of an ExtendedGeneticMap
from a file. Specifically, this is the maize genetic map constructed from the US NAM population as published by McMullen et al. (2009).
# read genetic map from file
# for the purpose of this example, do not automatically group markers
# and build an interpolation spline after reading genetic map data.
gmap = ExtendedGeneticMap.from_egmap(
"McMullen_2009_US_NAM.egmap",
auto_group = False,
auto_build_spline = False
)
Genetic Map Properties#
GeneticMap
objects have a set of properties shared by all genetic maps. These properties can be grouped into two categories: marker variant properties and spline properties. The former set of properties contain information about the marker set constituting the genetic map, while the latter set of properties contain spline model information necessary for the interpolation of genetic map positions. Marker variant and spline properties are summarized in the tables below.
Marker variant properties#
Property |
Description |
---|---|
|
Number of variants in the Genetic Map |
|
Marker variant chromosome group labels |
|
Marker variant chromosome physical positions |
|
Marker variant chromosome genetic positions |
|
Marker variant names |
|
Names of chromosome groups |
|
Chromosome group start indices |
|
Chromosome group stop indices |
|
Number of marker variants on each chromosome group |
Spline properties#
Property |
Description |
---|---|
|
Interpolation splines |
|
Interpolation spline type |
|
Interpolation spline default fill value |
Copying Genetic Maps#
At times, it may be necessary to copy a genetic map. There are two methods of copying genetic maps: shallow copying and deep copying.
Shallow copying#
In shallow copying, references to a GeneticMap
’s variant and spline data are copied to a new genetic map object. Copying is only one level deep and changes to the data in the original object may affect data values in the copied object.
# copy the genetic map
tmp = copy.copy(gmap)
tmp = gmap.copy()
Deep copying#
In deep copying, data in a GeneticMap
’s variant and spline data are recursively copied. Copying occurs down to the deepest level making it so that changes to the data in the original object will not affect data values in the copied object.
# deep copy the genetic map
tmp = copy.deepcopy(gmap)
tmp = gmap.deepcopy()
Sorting and Grouping Genetic Maps#
Reordering map elements#
In some instances, it may be useful to manually reorder genetic map elements. This may be accomplished by providing an array of reordering indices to the reorder
method.
# create reordering indices
indices = numpy.arange(gmap.nvrnt)
numpy.random.shuffle(indices)
tmp = gmap.deepcopy()
# reorder values
tmp.reorder(indices)
Lexsorting map elements#
An indirect stable sort may be performed using the lexsort
method. If the lexsort
method is not provided a set of keys
, it defaults to utilizing marker variant chromosome group assignments, marker variant chromosome physical positions, and marker variant chromosome genetic positions in that order of priority.
# create lexsort keys
key1 = numpy.random.randint(0, 10, gmap.nvrnt)
key2 = numpy.random.choice(gmap.nvrnt, gmap.nvrnt, False)
# lexsort using keys
out = gmap.lexsort((key2,key1))
Sorting map elements#
In-place sorting of marker variants in a GeneticMap
object can be accomplished using the sort
method. The sort
method optionally accepts a set of keys
which can be used to sort the marker variants in the genetic map. If a set of keys
is not provided, the keys are the same defaults as those in the lexsort
method.
# sort the genetic map
gmap.sort()
Grouping map elements#
On import from a file, genetic map data needs to be sorted and grouped into marker linkage groups so that an interpolation spline can be built. Often, the constructor or an import method for a GeneticMap
object will automatically sort and group data, but occasionally it may be necessary to manually sort and group marker data. This can be accomplished using the group
method:
# group markers based on their chromosome/linkage group
gmap.group()
To test whether a GeneticMap
’s data have been sorted and grouped into linkage groups, the is_grouped
method can be used:
# determine whether a GeneticMap is grouped using the ``is_grouped`` method
value = gmap.is_grouped()
Genetic Map Congruency#
Checking for congruency#
Sometimes physical positions and genetic map positions are in disagreement as to their orderings. This may be caused by errors made in genome assemblies and/or genetic map assemblies. Unfortunately, these disagreements cause issues for interpolation spline construction and need to be removed or corrected. An elementwise physical position-genetic position congruence test can be conducted using the congruence
method:
# elementwise test of marker congruence
value = gmap.congruence()
If one desires to test whether all of the markers in the GeneticMap
are congruent, one can use the is_congruent
method:
# whole genetic map congruence test
value = gmap.is_congruent()
Removing map discrepancies#
Loci where the physical positions and genetic positions are not in agreement may be automatically removed using the remove_discrepancies
. Be mindful that a manual inspection and correction of a genetic map’s discrepancies may be superior to this automatic method.
The code below demonstrates automatic discrepancy removal:
# automatically remove discrepancies
gmap.remove_discrepancies()
Building Interpolation Splines#
Before using a GeneticMap
to interpolate genetic position data, an interpolation spline must be constructed. Often, the constructor or an import method for a GeneticMap
object will automatically construct a spline from the provided data. Occasionally, it may be necessary to manually construct an interpolation spline. The build_spline
method can be used to construct an interpolation spline:
# construct a linear spline to interpolate genetic map positions
gmap.build_spline()
To test whether a GeneticMap
has an interpolation spline, the has_spline
method can be used:
# determine whether a GeneticMap has an interpolation spline using the
# ``has_spline`` method
value = gmap.has_spline()
Interpolating Genetic Positions#
Interpolating genetic map positions from physical positions can be done using the interp_genpos
method:
### create new positions to interpolate
# construct linkage group array: everything is on chromosome 1
chrgrp = numpy.array([1, 1, 1, 1, 1], dtype = int)
# construct physical position array
phypos = numpy.array([18203210,19293034,20110347,20474722,21398386], dtype = int)
# interpolate new gentic map positions
genpos = gmap.interp_genpos(
vrnt_chrgrp = chrgrp,
vrnt_phypos = phypos
)