## Abstract

Logistic regression is widely used to evaluate the association between risk factors and a binary outcome. The logistic curve is symmetric around its point of inflection. Alternative families of curves, such as the additive Gompertz or Guerrero-Johnson models, have been proposed in various scenarios due to their asymmetry: disease risk may initially increase rapidly and be followed by a longer period where the rate of growth slowly decreases. When modeling binary outcomes in relation to risk factors, an additive logistic model may not provide a good fit to the data. Suppose the outcome and an additive function of the risk factors are indeed related through an asymmetric function, but we model the relationship using a logistic function. We illustrate-both from a mathematical framework and through a simulation-based evaluation-that higher-order terms, such as pairwise interactions and quadratic terms, may be required in a logistic regression model to obtain a good fit to the data. Importantly, as significant higher-order terms may be a manifestation of model misspecification, these terms should be cautiously interpreted; a more pragmatic approach is to develop contrasts of disease risk coming from a good fitting model. We illustrate these concepts in 2 cohort studies examining early death for late-stage colorectal and pancreatic cancer cases, and 2 case-control studies investigating NAT2 acetylation, smoking, and advanced colorectal adenoma and bladder cancer.

Original language | English (US) |
---|---|

Pages (from-to) | 21-36 |

Number of pages | 16 |

Journal | Human Heredity |

Volume | 82 |

Issue number | 1-2 |

DOIs | |

State | Published - Sep 1 2017 |

Externally published | Yes |

## All Science Journal Classification (ASJC) codes

- Genetics
- Genetics(clinical)

## Keywords

- Growth curve
- Statistical interaction