Logistic regression is widely used to evaluate the association between risk factors and a binary outcome. The logistic curve is symmetric around its point of inflection. Alternative families of curves, such as the additive Gompertz or Guerrero-Johnson models, have been proposed in various scenarios due to their asymmetry: disease risk may initially increase rapidly and be followed by a longer period where the rate of growth slowly decreases. When modeling binary outcomes in relation to risk factors, an additive logistic model may not provide a good fit to the data. Suppose the outcome and an additive function of the risk factors are indeed related through an asymmetric function, but we model the relationship using a logistic function. We illustrate-both from a mathematical framework and through a simulation-based evaluation-that higher-order terms, such as pairwise interactions and quadratic terms, may be required in a logistic regression model to obtain a good fit to the data. Importantly, as significant higher-order terms may be a manifestation of model misspecification, these terms should be cautiously interpreted; a more pragmatic approach is to develop contrasts of disease risk coming from a good fitting model. We illustrate these concepts in 2 cohort studies examining early death for late-stage colorectal and pancreatic cancer cases, and 2 case-control studies investigating NAT2 acetylation, smoking, and advanced colorectal adenoma and bladder cancer.
All Science Journal Classification (ASJC) codes
- Growth curve
- Statistical interaction