Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data

Loading...
Thumbnail Image
Date
2022-01-03
Open Access Location
Journal Title
Journal ISSN
Volume Title
Publisher
Frontiers Media S A
Rights
(c) 2022 The Author/s
CC BY 4.0
Abstract
Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In this simulation-based study, we investigate the accuracy of genotype imputation in relation to some factors characterizing SNP chip or low-coverage whole-genome sequencing (LCWGS) data. The factors included the imputation reference population size, the proportion of target markers /SNP density, the genetic relationship (distance) between the target population and the reference population, and the imputation method. Simulations of genotypes were based on coalescence theory accounting for the demographic history of pigs. A population of simulated founders diverged to produce four separate but related populations of descendants. The genomic data of 20,000 individuals were simulated for a 10-Mb chromosome fragment. Our results showed that the proportion of target markers or SNP density was the most critical factor affecting imputation accuracy under all imputation situations. Compared with Minimac4, Beagle5.1 reproduced higher-accuracy imputed data in most cases, more notably when imputing from the LCWGS data. Compared with SNP chip data, LCWGS provided more accurate genotype imputation. Our findings provided a relatively comprehensive insight into the accuracy of genotype imputation in a realistic population of domestic animals.
Description
Keywords
genotype imputation, SNP density, reference population size, imputation accuracy, SNP chip, sequencing
Citation
Deng T, Zhang P, Garrick D, Gao H, Wang L, Zhao F. (2022). Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data. Frontiers in Genetics. 12.
Collections