Linear models are the most common predictive models for a continuous, discrete or categorical response and often include interaction terms, but for more than a few predictors interactions tend to be neglected because they add too many terms to the model. In this paper, we propose a simulation-based tree method to detect the interactions, which contributes to the predictions. In the method, we first bootstrap the observations and randomly choose a number of variables to build trees. The interactions between the roots and the corresponding leaves are collected. The times of each interaction that appear are counted. To obtain the benchmark of the number of each interaction that appears in the trees, the response values are substituted by randomly generated values and then we repeat the procedure. The interactions with occurrence frequency more than the benchmark are put into the regression models. Finally, we select variables by running LASSO for the model with main effects and the interactions obtained. In the experiments, our method shows good performances, especially for the data set with many interactions.
All Science Journal Classification (ASJC) codes
- Statistics and Probability