A new expectation-maximization statistical test for case-control association studies considering rare variants obtained by high-throughput sequencing

Derek Gordon, Stephen J. Finch, Francisco De La Vega

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Genome-wide association studies (GWAS) have been successful in identifying common genetic variation reproducibly associated with disease. However, most associated variants confer very small risk and after meta-analysis of large cohorts a large fraction of expected heritability still remains unexplained. A possible explanation is that rare variants currently undetected by GWAS with SNP arrays could contribute a large fraction of risk when present in cases. This concept has spurred great interest in exploring the role of rare variants in disease. As the cost of sequencing continue to plummet, it is becoming feasible to directly sequence case-control samples for testing disease association including rare variants. We have developed a test statistic that allows for association testing among cases and controls using data directly from sequencing reads. In addition, our method allows for random errors in reads. We determine the probability of a true genotype call based on the observed base pair reads using the expectation-maximization algorithm. We apply the SumStat procedure to obtain a single statistic for a group of multiple rare variant loci. We document the validity of our method through simulations. Our results suggest that our statistic maintains the correct type I error rate, even in the presence of differential misclassification for sequence reads, and that it has good power under a number of scenarios. Finally, our SumStat results show power at least as good as the maximum single locus results.

Original languageEnglish (US)
Pages (from-to)113-125
Number of pages13
JournalHuman Heredity
Volume71
Issue number2
DOIs
StatePublished - Jul 2011

All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)

Keywords

  • Expectation-maximization
  • Genetics
  • Misclassification
  • Multi-locus
  • Noncentrality parameter
  • Power
  • Sequence
  • Statistic

Fingerprint Dive into the research topics of 'A new expectation-maximization statistical test for case-control association studies considering rare variants obtained by high-throughput sequencing'. Together they form a unique fingerprint.

Cite this