Single nucleotide polymorphisms (SNP) may be used in case-control designs to test for association between a SNP marker and a disease. Such designs may assume that the genotype data are reported without error. Our goal is quantifying the effects that errors have on sample size for case-control studies with haplotypes formed by a disease locus and a SNP marker locus in the presence of linkage disequilibrium (LD). We consider the effects of a recently published error model on 2x3 chi-square analysis. We study the joint relation of LD and errors with sample size for three specific genetic disease models and two settings each of marker allele frequencies (total of 6 studies). Minimal sample size necessary for fixed asymptotic power is estimated as a 4th degree polynomial in the variables S (error) and D' (LD measure) via a backward step-wise regression. We find that increased error rates lower power. In all studies, we observe that LD and errors interact in a non-linear fashion. In particular, regression analyses shows that several higher order interaction terms have coefficients significantly different from 0 in each study, with fraction of variance explained greater than 0.9999. Finally, the increase in sample size necessary to maintain constant asymptotic power and level of significance as a function of S is smallest when D' = 1 (perfect LD). The increase grows monotonically as D' decreases to 0.5 for all studies.
|Original language||English (US)|
|Number of pages||12|
|Journal||Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing|
|State||Published - 2003|
All Science Journal Classification (ASJC) codes