Title: | Estimation of Recombination Rate and Maternal LD in Half-Sibs |
---|---|
Description: | Paternal recombination rate and maternal linkage disequilibrium (LD) are estimated for pairs of biallelic markers such as single nucleotide polymorphisms (SNPs) from progeny genotypes and sire haplotypes. The implementation relies on paternal half-sib families. If maternal half-sib families are used, the roles of sire/dam are swapped. Multiple families can be considered. For parameter estimation, at least one sire has to be double heterozygous at the investigated pairs of SNPs. Based on recombination rates, genetic distances between markers can be estimated. Markers with unusually large recombination rate to markers in close proximity (i.e. putatively misplaced markers) shall be discarded in this derivation. A workflow description is attached as vignette. *A pipeline is available at GitHub* <https://github.com/wittenburg/hsrecombi> Hampel, Teuscher, Gomez-Raya, Doschoris, Wittenburg (2018) "Estimation of recombination rate and maternal linkage disequilibrium in half-sibs" <doi:10.3389/fgene.2018.00186>. Gomez-Raya (2012) "Maximum likelihood estimation of linkage disequilibrium in half-sib families" <doi:10.1534/genetics.111.137521>. |
Authors: | Dörte Wittenburg [aut, cre] |
Maintainer: | Dörte Wittenburg <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1 |
Built: | 2025-02-05 04:41:43 UTC |
Source: | https://github.com/cran/hsrecombi |
Approximation of mixing parameter of system of map functions
bestmapfun(theta, dist_M)
bestmapfun(theta, dist_M)
theta |
vector of recombination rates |
dist_M |
vector of genetic positions |
The genetic mapping function that fits best to the genetic data (recombination rate and genetic distances) is obtained from Rao's system of genetic-map functions. The corresponding mixing parameter is estimated via 1-dimensional constrained optimisation. See vignette for its application to estimated data.
list (LEN 2)
mixing parameter of system of genetic mapping functions
minimum value of target function (theta - dist_M)^2
Rao, D.C., Morton, N.E., Lindsten, J., Hulten, M. & Yee, S (1977) A mapping function for man. Human Heredity 27: 99-104. doi:10.1159/000152856
theta <- seq(0, 0.5, 0.01) gendist <- -log(1 - 2 * theta) / 2 bestmapfun(theta, gendist)
theta <- seq(0, 0.5, 0.01) gendist <- -log(1 - 2 * theta) / 2 bestmapfun(theta, gendist)
Search for SNPs with unusually large estimates of recombination rate
checkCandidates(final, map1, win = 30, quant = 0.99)
checkCandidates(final, map1, win = 30, quant = 0.99)
final |
table of results produced by |
map1 |
data.frame containing information on physical map, at least:
|
win |
optional value for window size; default value 30 |
quant |
optional value; default value 0.99, see details |
Markers with unusually large estimates of recombination rate to
close SNPs are candidates for misplacement in the underlying assembly. The
mean of recombination rate estimates with win
subsequent or
preceding markers is calculated and those SNPs with mean value exceeding
the quant
quantile are denoted as candidates which have to be
manually curated!
This can be done, for instance, by visual inspection of a correlation plot
containing estimates of recombination rate in a selected region.
vector of SNP IDs for further verification
Hampel, A., Teuscher, F., Gomez-Raya, L., Doschoris, M. & Wittenburg, D. (2018) Estimation of recombination rate and maternal linkage disequilibrium in half-sibs. Frontiers in Genetics 9:186. doi:10.3389/fgene.2018.00186
### test data data(targetregion) ### make list for paternal half-sib families hap <- makehaplist(daughterSire, hapSire) ### parameter estimates on a chromosome res <- hsrecombi(hap, genotype.chr) ### post-processing to achieve final and valid set of estimates final <- editraw(res, map.chr) ### check for candidates of misplacement snp <- checkCandidates(final, map.chr)
### test data data(targetregion) ### make list for paternal half-sib families hap <- makehaplist(daughterSire, hapSire) ### parameter estimates on a chromosome res <- hsrecombi(hap, genotype.chr) ### post-processing to achieve final and valid set of estimates final <- editraw(res, map.chr) ### check for candidates of misplacement snp <- checkCandidates(final, map.chr)
Count genotype combinations at 2 SNPs
X |
integer matrix of genotypes |
count
vector of counts of 9 possible genotypes at SNP pair
Vector of sire ID for each progeny
daughterSire
daughterSire
An object of class integer
of length 265.
Process raw results from hsrecombi
, decide which out of
two sets of estimates is more likely and prepare list of final results
editraw(Roh, map1)
editraw(Roh, map1)
Roh |
list of raw results from |
map1 |
data.frame containing information on physical map, at least:
|
final table of results
SNP1
index 1. SNP
SNP2
index 2. SNP
D
maternal LD
fAA
frequency of maternal haplotype 1-1
fAB
frequency of maternal haplotype 1-0
fBA
frequency of maternal haplotype 0-1
fBB
frequency of maternal haplotype 0-0
p1
Maternal allele frequency (allele 1) at SNP1
p2
Maternal allele frequency (allele 1) at SNP2
nfam1
size of genomic family 1
nfam2
size of genomic family 2
error
0 if computations were without error; 1 if EM algorithm did not converge
iteration
number of EM iterations
theta
paternal recombination rate
r2
of maternal LD
logL
value of log likelihood function
unimodal
1 if likelihood is unimodal; 0 if likelihood is bimodal
critical
0 if parameter estimates were unique; 1 if parameter estimates were obtained via decision process
locus_Mb
physical distance between SNPs in Mbp
### test data data(targetregion) ### make list for paternal half-sib families hap <- makehaplist(daughterSire, hapSire) ### parameter estimates on a chromosome res <- hsrecombi(hap, genotype.chr) ### post-processing to achieve final and valid set of estimates final <- editraw(res, map.chr)
### test data data(targetregion) ### make list for paternal half-sib families hap <- makehaplist(daughterSire, hapSire) ### parameter estimates on a chromosome res <- hsrecombi(hap, genotype.chr) ### post-processing to achieve final and valid set of estimates final <- editraw(res, map.chr)
Calculation of genetic distances from recombination rates given an interference parameter
felsenstein(K, x, inverse = F)
felsenstein(K, x, inverse = F)
K |
parameter (numeric) corresponding to the intensity of crossover interference |
x |
vector of recombination rates |
inverse |
logical, if FALSE recombination rate is mapped to Morgan unit, if TRUE Morgan unit is mapped to recombination rate (default is FALSE) |
vector of genetic positions in Morgan units
Felsenstein, J. (1979) A mathematically tractable family of genetic mapping functions with different amounts of interference. Genetics 91:769-775.
felsenstein(0.1, seq(0, 0.5, 0.01))
felsenstein(0.1, seq(0, 0.5, 0.01))
Estimation of genetic positions (in centi Morgan)
geneticPosition(final, map1, exclude = NULL, threshold = 0.05)
geneticPosition(final, map1, exclude = NULL, threshold = 0.05)
final |
table of results produced by |
map1 |
data.frame containing information on physical map, at least:
|
exclude |
optional vector (LEN < p) of SNP IDs to be excluded (e.g., candidates of misplaced SNPs; default NULL) |
threshold |
optional value; recombination rates <= threshold are considered for smoothing approach assuming theta ~ Morgan (default 0.05) |
Smoothing of recombination rates (theta) <= 0.05 via quadratic optimization provides an approximation of genetic distances (in Morgan) between SNPs. The cumulative sum * 100 yields the genetic positions in cM.
The minimization problem (theta - D d)^2
is solved s.t. d > 0 where
d is the vector of genetic distances between adjacent markers but theta is
not restricted to adjacent markers. The incidence matrix D contains 1's for
those intervals contributing to the total distance relevant for each theta.
Estimates of theta = 1e-6 are neglected as these values coincide with start values and indicate that (because of a very flat likelihood surface) no meaningful estimate of recombination rate has been obtained.
list (LEN 2)
vector (LEN p) of genetic positions of SNPs (in cM)
vector (LEN p) of physical positions of SNPs (in Mbp)
Qanbari, S. & Wittenburg, D. (2020) Male recombination map of the autosomal genome in German Holstein. Genetics Selection Evolution 52:73. doi:10.1186/s12711-020-00593-z
### test data data(targetregion) ### make list for paternal half-sib families hap <- makehaplist(daughterSire, hapSire) ### parameter estimates on a chromosome res <- hsrecombi(hap, genotype.chr) ### post-processing to achieve final and valid set of estimates final <- editraw(res, map.chr) ### approximation of genetic positions pos <- geneticPosition(final, map.chr)
### test data data(targetregion) ### make list for paternal half-sib families hap <- makehaplist(daughterSire, hapSire) ### parameter estimates on a chromosome res <- hsrecombi(hap, genotype.chr) ### post-processing to achieve final and valid set of estimates final <- editraw(res, map.chr) ### approximation of genetic positions pos <- geneticPosition(final, map.chr)
matrix of progeny genotypes in target region on chromosome BTA1
genotype.chr
genotype.chr
An object of class matrix
(inherits from array
) with 265 rows and 200 columns.
Calculation of genetic distances from recombination rates
haldane(x, inverse = F)
haldane(x, inverse = F)
x |
vector of recombination rates |
inverse |
logical, if FALSE recombination rate is mapped to Morgan unit, if TRUE Morgan unit is mapped to recombination rate (default is FALSE) |
vector of genetic positions in Morgan units
Haldane JBS (1919) The combination of linkage values, and the calculation of distances between the loci of linked factors. J Genet 8: 299-309.
haldane(seq(0, 0.5, 0.01))
haldane(seq(0, 0.5, 0.01))
matrix of sire haplotypes in target region on chromosome BTA1
hapSire
hapSire
An object of class matrix
(inherits from array
) with 10 rows and 201 columns.
Wrapper function for estimating recombination rate and maternal linkage disequilibrium between intra-chromosomal SNP pairs by calling EM algorithm
hsrecombi(hap, genotype.chr, exclude = NULL, only.adj = FALSE, prec = 1e-06)
hsrecombi(hap, genotype.chr, exclude = NULL, only.adj = FALSE, prec = 1e-06)
hap |
list (LEN 2) of lists
|
genotype.chr |
matrix (DIM n x p) of all progeny genotypes (0, 1, 2) on a chromosome with p SNPs; 9 indicates missing genotype |
exclude |
vector (LEN < p) of SNP IDs (for filtering column names of
|
only.adj |
logical; if |
prec |
scalar; precision of estimation |
Paternal recombination rate and maternal linkage disequilibrium (LD) are estimated for pairs of biallelic markers (such as single nucleotide polymorphisms; SNPs) from progeny genotypes and sire haplotypes. At least one sire has to be double heterozygous at the investigated pairs of SNPs. All progeny are merged in two genomic families: (1) coupling phase family if sires are double heterozygous 0-0/1-1 and (2) repulsion phase family if sires are double heterozygous 0-1/1-0. So far it is recommended processing the chromosomes separately. If maternal half-sib families are used, the roles of sire/dam are swapped. Multiple families can be considered.
list (LEN p - 1) of data.frames; for each SNP, parameters are estimated with all following SNPs; two solutions (prefix sln1 and sln2) are obtained for two runs of the EM algorithm
SNP1
ID of 1. SNP
SNP2
ID of 2. SNP
D
maternal LD
fAA
frequency of maternal haplotype 1-1
fAB
frequency of maternal haplotype 1-0
fBA
frequency of maternal haplotype 0-1
fBB
frequency of maternal haplotype 0-0
p1
Maternal allele frequency (allele 1) at SNP1
p2
Maternal allele frequency (allele 1) at SNP2
nfam1
size of genomic family 1
nfam2
size of genomic family 2
error
0 if computations were without error; 1 if EM algorithm did not converge
iteration
number of EM iterations
theta
paternal recombination rate
r2
of maternal LD
logL
value of log likelihood function
unimodal
1 if likelihood is unimodal; 0 if likelihood is bimodal
critical
0 if parameter estimates are unique; 1 if parameter estimates at both solutions are valid, then decision process follows in post-processing function "editraw"
Afterwards, solutions are compared and processed with function
editraw
, yielding the final estimates for each valid pair of SNPs.
Hampel, A., Teuscher, F., Gomez-Raya, L., Doschoris, M. & Wittenburg, D. (2018) Estimation of recombination rate and maternal linkage disequilibrium in half-sibs. Frontiers in Genetics 9:186. doi:10.3389/fgene.2018.00186
Gomez-Raya, L. (2012) Maximum likelihood estimation of linkage disequilibrium in half-sib families. Genetics 191:195-213.
### test data data(targetregion) ### make list for paternal half-sib families hap <- makehaplist(daughterSire, hapSire) ### parameter estimates on a chromosome res <- hsrecombi(hap, genotype.chr) ### post-processing to achieve final and valid set of estimates final <- editraw(res, map.chr)
### test data data(targetregion) ### make list for paternal half-sib families hap <- makehaplist(daughterSire, hapSire) ### parameter estimates on a chromosome res <- hsrecombi(hap, genotype.chr) ### post-processing to achieve final and valid set of estimates final <- editraw(res, map.chr)
Calculation of genetic distances from recombination rates given a parameter
karlin(N, x, inverse = F)
karlin(N, x, inverse = F)
N |
parameter (positive integer) required by the binomial model to
assess the count (of crossover) distribution; |
x |
vector of recombination rates |
inverse |
logical, if FALSE recombination rate is mapped to Morgan unit, if TRUE Morgan unit is mapped to recombination rate (default is FALSE) |
vector of genetic positions in Morgan units
Liberman, U. & Karlin, S. (1984) Theoretical models of genetic map functions. Theor Popul Biol 25:331-346.
karlin(2, seq(0, 0.5, 0.01))
karlin(2, seq(0, 0.5, 0.01))
Calculation of genetic distances from recombination rates
kosambi(x, inverse = F)
kosambi(x, inverse = F)
x |
vector of recombination rates |
inverse |
logical, if FALSE recombination rate is mapped to Morgan unit, if TRUE Morgan unit is mapped to recombination rate (default is FALSE) |
vector of genetic positions in Morgan units
Kosambi D.D. (1944) The estimation of map distance from recombination values. Ann. Eugen. 12: 172-175.
kosambi(seq(0, 0.5, 0.01))
kosambi(seq(0, 0.5, 0.01))
Expectation Maximisation (EM) algorithm
LDHScpp(XGF1, XGF2, fAA, fAB, fBA, theta, display, threshold)
LDHScpp(XGF1, XGF2, fAA, fAB, fBA, theta, display, threshold)
XGF1 |
integer matrix of progeny genotypes in genomic family 1 |
XGF2 |
integer matrix of progeny genotypes in genomic family 2 |
fAA |
frequency of maternal haplotype 1-1 |
fAB |
frequency of maternal haplotype 1-0 |
fBA |
frequency of maternal haplotype 0-1 |
theta |
paternal recombination rate |
display |
logical for displaying additional information |
threshold |
convergence criterion |
list of parameter estimates
D
maternal LD
fAA
frequency of maternal haplotype 1-1
fAB
frequency of maternal haplotype 1-0
fBA
frequency of maternal haplotype 0-1
fBB
frequency of maternal haplotype 0-0
p1
Maternal allele frequency (allele 1) at 1. SNP
p2
Maternal allele frequency (allele 1) at 2. SNP
nfam1
size of genomic family 1
nfam2
size of genomic family 2
error
0 if computations were without error; 1 if EM algorithm did not converge
iteration
number of EM iterations
theta
paternal recombination rate
r2
of maternal LD
logL
value of log likelihood function
Calculate log-likelihood function
counts |
integer vector of observed 2-locus genotype |
fAA |
frequency of maternal haplotype 1-1 |
fAB |
frequency of maternal haplotype 1-0 |
fBA |
frequency of maternal haplotype 0-1 |
fBB |
frequency of maternal haplotype 0-0 |
theta |
paternal recombination rate |
lik
value of log likelihood at parameter estimates
List of sire haplotypes is set up in the format required for
hsrecombi. Sire haplotypes are imputed from progeny genotypes using R
package hsphase
.
makehap(sireID, daughterSire, genotype.chr, nmin = 30, exclude = NULL)
makehap(sireID, daughterSire, genotype.chr, nmin = 30, exclude = NULL)
sireID |
vector (LEN N) of IDs of all sires |
daughterSire |
vector (LEN n) of sire ID for each progeny |
genotype.chr |
matrix (DIM n x p) of progeny genotypes (0, 1, 2) on a single chromosome with p SNPs; 9 indicates missing genotype |
nmin |
scalar, minimum required number of progeny for proper imputation, default 30 |
exclude |
vector (LEN < p) of SNP indices to be excluded from analysis |
list (LEN 2) of lists. For each sire:
famID
list (LEN N) of vectors (LEN n.progeny) of progeny indices relating to lines in genotype matrix
sireHap
list (LEN N) of matrices (DIM 2 x p) of sire haplotypes (0, 1) on investigated chromosome
Ferdosi, M., Kinghorn, B., van der Werf, J., Lee, S. & Gondro, C. (2014) hsphase: an R package for pedigree reconstruction, detection of recombination events, phasing and imputation of half-sib family groups BMC Bioinformatics 15:172. https://CRAN.R-project.org/package=hsphase
data(targetregion) hap <- makehap(unique(daughterSire), daughterSire, genotype.chr)
data(targetregion) hap <- makehap(unique(daughterSire), daughterSire, genotype.chr)
List of sire haplotypes is set up in the format required for hsrecombi. Haplotypes (obtained by external software) are provided.
makehaplist(daughterSire, hapSire, nmin = 1)
makehaplist(daughterSire, hapSire, nmin = 1)
daughterSire |
vector (LEN n) of sire ID for each progeny |
hapSire |
matrix (DIM 2N x p + 1) of sire haplotype at p SNPs; 2 lines per sire, 1. columns contains sire ID |
nmin |
scalar, minimum number of progeny required, default 1 |
list (LEN 2) of lists. For each sire:
famID
list (LEN N) of vectors (LEN n.progeny) of progeny indices relating to lines in genotype matrix
sireHap
list (LEN N) of matrices (DIM 2 x p) of sire haplotypes (0, 1) on investigated chromosome
data(targetregion) hap <- makehaplist(daughterSire, hapSire)
data(targetregion) hap <- makehaplist(daughterSire, hapSire)
List of sire haplotypes is set up in the format required for
hsrecombi. Sire haplotypes are imputed from progeny genotypes using R
package hsphase
. Furthermore, recombination rate estimates between
adjacent SNPs from hsphase are reported.
makehappm(sireID, daughterSire, genotype.chr, nmin = 30, exclude = NULL)
makehappm(sireID, daughterSire, genotype.chr, nmin = 30, exclude = NULL)
sireID |
vector (LEN N) of IDs of all sires |
daughterSire |
vector (LEN n) of sire ID for each progeny |
genotype.chr |
matrix (DIM n x p) of progeny genotypes (0, 1, 2) on a single chromosome with p SNPs; 9 indicates missing genotype |
nmin |
scalar, minimum required number of progeny for proper imputation, default 30 |
exclude |
vector (LEN < p) of SNP IDs (for filtering column names of
|
list (LEN 2) of lists. For each sire:
famID
list (LEN N) of vectors (LEN n.progeny) of progeny indices relating to lines in genotype matrix
sireHap
list (LEN N) of matrices (DIM 2 x p) of sire haplotypes (0, 1) on investigated chromosome
vector (LEN p - 1) of proportion of recombinant progeny over all families between adjacent SNPs
list (LEN N) of vectors (LEN n.progeny) of number of recombination events per animal
vector (LEN p) of genetic positions of SNPs (in cM)
Ferdosi, M., Kinghorn, B., van der Werf, J., Lee, S. & Gondro, C. (2014) hsphase: an R package for pedigree reconstruction, detection of recombination events, phasing and imputation of half-sib family groups BMC Bioinformatics 15:172. https://CRAN.R-project.org/package=hsphase
data(targetregion) hap <- makehappm(unique(daughterSire), daughterSire, genotype.chr, exclude = paste0('V', 301:310))
data(targetregion) hap <- makehappm(unique(daughterSire), daughterSire, genotype.chr, exclude = paste0('V', 301:310))
SNP marker map in target region on chromosome BTA1 according to ARS-UCD1.2
map.chr
map.chr
map.chr |
data frame
|
An object of class data.frame
with 200 rows and 6 columns.
Calculation of genetic distances from recombination rates given a mixing parameter
rao(p, x, inverse = F)
rao(p, x, inverse = F)
p |
mixing parameter (see details); |
x |
vector of recombination rates |
inverse |
logical, if FALSE recombination rate is mapped to Morgan unit, if TRUE Morgan unit is mapped to recombination rate (default is FALSE) |
Mixing parameter p=0
would match to Morgan, p=0.25
to
Carter, p=0.5
to Kosambi and p=1
to Haldane map function.
As an inverse of Rao's system of functions does not exist, NA will be
produced if inverse = T
. To approximate the inverse call function
rao.inv(p, x)
.
vector of genetic positions in Morgan units
Rao, D.C., Morton, N.E., Lindsten, J., Hulten, M. & Yee, S (1977) A mapping function for man. Human Heredity 27: 99-104. doi:10.1159/000152856
rao(0.25, seq(0, 0.5, 0.01))
rao(0.25, seq(0, 0.5, 0.01))
Calculation of recombination rates from genetic distances given a mixing parameter
rao.inv(p, x)
rao.inv(p, x)
p |
mixing parameter (see details); |
x |
vector in Morgan units |
Mixing parameter p=0
would match to Morgan, p=0.25
to
Carter, p=0.5
to Kosambi and p=1
to Haldane map function.
vector of recombination rates
Rao, D.C., Morton, N.E., Lindsten, J., Hulten, M. & Yee, S (1977) A mapping function for man. Human Heredity 27: 99-104. doi:10.1159/000152856
rao.inv(0.25, seq(0, 01, 0.1))
rao.inv(0.25, seq(0, 01, 0.1))
Determine default start values for Expectation Maximisation (EM) algorithm that is used to estimate paternal recombination rate and maternal haplotype frequencies
startvalue(Fam1, Fam2, Dd = 0, prec = 1e-06)
startvalue(Fam1, Fam2, Dd = 0, prec = 1e-06)
Fam1 |
matrix (DIM n.progeny x 2) of progeny genotypes (0, 1, 2) of genomic family with coupling phase sires (1) at SNP pair |
Fam2 |
matrix (DIM n.progeny x 2) of progeny genotypes (0, 1, 2) of genomic family with repulsion phase sires (2) at SNP pair |
Dd |
maternal LD, default 0 |
prec |
minimum accepted start value for fAA, fAB, fBA; default
|
list (LEN 8)
fAA.start
frequency of maternal haplotype 1-1
fAB.start
frequency of maternal haplotype 1-0
fBA.start
frequency of maternal haplotype 0-1
p1
estimate of maternal allele frequency (allele 1) when sire
is heterozygous at SNP1
p2
estimate of maternal allele frequency (allele 1) when sire
is heterozygous at SNP2
L1
lower bound of maternal LD
L2
upper bound for maternal LD
critical
0 if parameter estimates are unique; 1 if parameter estimates at both solutions are valid
n1 <- 100 n2 <- 20 G1 <- matrix(ncol = 2, nrow = n1, sample(c(0:2), replace = TRUE, size = 2 * n1)) G2 <- matrix(ncol = 2, nrow = n2, sample(c(0:2), replace = TRUE, size = 2 * n2)) startvalue(G1, G2)
n1 <- 100 n2 <- 20 G1 <- matrix(ncol = 2, nrow = n1, sample(c(0:2), replace = TRUE, size = 2 * n1)) G2 <- matrix(ncol = 2, nrow = n2, sample(c(0:2), replace = TRUE, size = 2 * n2)) startvalue(G1, G2)
The data set contains sire haplotypes, assignment of progeny to sire, progeny genotypes and physical map information in a target region
The raw data can be downloaded at the source given below. Then,
executing the following R code leads to the data provided in
targetregion.RData
.
hapSire
matrix of sire haplotypes of each sire; 2 lines per sire; 1. column contains sireID
daughterSire
vector of sire ID for each progeny
genotype.chr
matrix of progeny genotypes
map.chr
SNP marker map in target region
The data are available at RADAR doi:10.22000/280
## Not run: # download data from RADAR (requires about 1.4 GB) url <- "https://www.radar-service.eu/radar-backend/archives/fqSPQoIvjtOGJlav/versions/1/content" curl_download(url = url, 'tmp.tar') untar('tmp.tar') file.remove('tmp.tar') path <- '10.22000-280/data/dataset' ## list of haplotypes of sires for each chromosome load(file.path(path, 'sire_haplotypes.RData')) ## assign progeny to sire daughterSire <- read.table(file.path(path, 'assign_to_family.txt'))[, 1] ## progeny genotypes X <- as.matrix(read.table(file.path(path, 'XFam-ARS.txt'))) ## physical and approximated genetic map map <- read.table(file.path(path, 'map50K_ARS_reordered.txt'), header = T) ## select target region chr <- 1 window <- 301:500 ## map information of target region map.chr <- map[map$Chr == chr, ][window, ] ## matrix of sire haplotypes in target region hapSire <- rlist::list.rbind(haps[[chr]]) sireID <- 1:length(unique(daughterSire)) hapSire <- cbind(rep(sireID, each = 2), hapSire[, window]) ## matrix of progeny genotypes genotype.chr <- X[, map.chr$SNP] colnames(genotype.chr) <- map.chr$SNP save(list = c('genotype.chr', 'hapSire', 'map.chr', 'daughterSire'), file = 'targetregion.RData', compress = 'xz') ## End(Not run)
## Not run: # download data from RADAR (requires about 1.4 GB) url <- "https://www.radar-service.eu/radar-backend/archives/fqSPQoIvjtOGJlav/versions/1/content" curl_download(url = url, 'tmp.tar') untar('tmp.tar') file.remove('tmp.tar') path <- '10.22000-280/data/dataset' ## list of haplotypes of sires for each chromosome load(file.path(path, 'sire_haplotypes.RData')) ## assign progeny to sire daughterSire <- read.table(file.path(path, 'assign_to_family.txt'))[, 1] ## progeny genotypes X <- as.matrix(read.table(file.path(path, 'XFam-ARS.txt'))) ## physical and approximated genetic map map <- read.table(file.path(path, 'map50K_ARS_reordered.txt'), header = T) ## select target region chr <- 1 window <- 301:500 ## map information of target region map.chr <- map[map$Chr == chr, ][window, ] ## matrix of sire haplotypes in target region hapSire <- rlist::list.rbind(haps[[chr]]) sireID <- 1:length(unique(daughterSire)) hapSire <- cbind(rep(sireID, each = 2), hapSire[, window]) ## matrix of progeny genotypes genotype.chr <- X[, map.chr$SNP] colnames(genotype.chr) <- map.chr$SNP save(list = c('genotype.chr', 'hapSire', 'map.chr', 'daughterSire'), file = 'targetregion.RData', compress = 'xz') ## End(Not run)