TY - JOUR
T1 - Extent to which array genotyping and imputation with large reference panels approximate deep whole-genome sequencing
AU - NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium
AU - Hanks, Sarah C.
AU - Forer, Lukas
AU - Schönherr, Sebastian
AU - LeFaive, Jonathon
AU - Martins, Taylor
AU - Welch, Ryan
AU - Gagliano Taliun, Sarah A.
AU - Braff, David
AU - Johnsen, Jill M.
AU - Kenny, Eimear E.
AU - Konkle, Barbara A.
AU - Laakso, Markku
AU - Loos, Ruth F.J.
AU - McCarroll, Steven
AU - Pato, Carlos
AU - Pato, Michele T.
AU - Smith, Albert V.
AU - Boehnke, Michael
AU - Scott, Laura J.
AU - Fuchsberger, Christian
N1 - Publisher Copyright:
© 2022 American Society of Human Genetics
PY - 2022/9/1
Y1 - 2022/9/1
N2 - Understanding the genetic basis of human diseases and traits is dependent on the identification and accurate genotyping of genetic variants. Deep whole-genome sequencing (WGS), the gold standard technology for SNP and indel identification and genotyping, remains very expensive for most large studies. Here, we quantify the extent to which array genotyping followed by genotype imputation can approximate WGS in studies of individuals of African, Hispanic/Latino, and European ancestry in the US and of Finnish ancestry in Finland (a population isolate). For each study, we performed genotype imputation by using the genetic variants present on the Illumina Core, OmniExpress, MEGA, and Omni 2.5M arrays with the 1000G, HRC, and TOPMed imputation reference panels. Using the Omni 2.5M array and the TOPMed panel, ≥90% of bi-allelic single-nucleotide variants (SNVs) are well imputed (r2 > 0.8) down to minor-allele frequencies (MAFs) of 0.14% in African, 0.11% in Hispanic/Latino, 0.35% in European, and 0.85% in Finnish ancestries. There was little difference in TOPMed-based imputation quality among the arrays with >700k variants. Individual-level imputation quality varied widely between and within the three US studies. Imputation quality also varied across genomic regions, producing regions where even common (MAF > 5%) variants were consistently not well imputed across ancestries. The extent to which array genotyping and imputation can approximate WGS therefore depends on reference panel, genotype array, sample ancestry, and genomic location. Imputation quality by variant or genomic region can be queried with our new tool, RsqBrowser, now deployed on the Michigan Imputation Server.
AB - Understanding the genetic basis of human diseases and traits is dependent on the identification and accurate genotyping of genetic variants. Deep whole-genome sequencing (WGS), the gold standard technology for SNP and indel identification and genotyping, remains very expensive for most large studies. Here, we quantify the extent to which array genotyping followed by genotype imputation can approximate WGS in studies of individuals of African, Hispanic/Latino, and European ancestry in the US and of Finnish ancestry in Finland (a population isolate). For each study, we performed genotype imputation by using the genetic variants present on the Illumina Core, OmniExpress, MEGA, and Omni 2.5M arrays with the 1000G, HRC, and TOPMed imputation reference panels. Using the Omni 2.5M array and the TOPMed panel, ≥90% of bi-allelic single-nucleotide variants (SNVs) are well imputed (r2 > 0.8) down to minor-allele frequencies (MAFs) of 0.14% in African, 0.11% in Hispanic/Latino, 0.35% in European, and 0.85% in Finnish ancestries. There was little difference in TOPMed-based imputation quality among the arrays with >700k variants. Individual-level imputation quality varied widely between and within the three US studies. Imputation quality also varied across genomic regions, producing regions where even common (MAF > 5%) variants were consistently not well imputed across ancestries. The extent to which array genotyping and imputation can approximate WGS therefore depends on reference panel, genotype array, sample ancestry, and genomic location. Imputation quality by variant or genomic region can be queried with our new tool, RsqBrowser, now deployed on the Michigan Imputation Server.
KW - genotype imputation
KW - genotyping array
KW - whole-genome sequencing
UR - http://www.scopus.com/inward/record.url?scp=85137165762&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137165762&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2022.07.012
DO - 10.1016/j.ajhg.2022.07.012
M3 - Article
C2 - 35981533
AN - SCOPUS:85137165762
SN - 0002-9297
VL - 109
SP - 1653
EP - 1666
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 9
ER -