TY - JOUR
T1 - Hierarchical Total Variations and Doubly Penalized ANOVA Modeling for Multivariate Nonparametric Regression
AU - Yang, Ting
AU - Tan, Zhiqiang
N1 - Funding Information:
An earlier version of this work was completed as part of Ting Yang’s PhD thesis at Rutgers University. The authors thank two referees for helpful comments
Publisher Copyright:
© 2021 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
PY - 2021
Y1 - 2021
N2 - For multivariate nonparametric regression, functional analysis of variance (ANOVA) modeling aims to capture the relationship between a response and covariates by decomposing the unknown function into various components, representing main effects, two-way interactions, etc. Such an approach has been pursued explicitly in smoothing spline ANOVA modeling and implicitly in various greedy methods such as MARS. We develop a new method for functional ANOVA modeling, based on doubly penalized estimation using total-variation and empirical-norm penalties, to achieve sparse selection of component functions and their basis functions. For this purpose, we formulate a new class of hierarchical total variations, which measures total variations at different levels including main effects and multi-way interactions, possibly after some order of differentiation. Furthermore, we derive suitable basis functions for multivariate splines such that the hierarchical total variation can be represented as a regular Lasso penalty, and hence we extend a previous backfitting algorithm to handle doubly penalized estimation for ANOVA modeling. We present extensive numerical experiments on simulations and real data to compare our method with existing methods including MARS, tree boosting, and random forest. The results are very encouraging and demonstrate notable gains from our method in prediction or classification accuracy and simplicity of the fitted functions. Supplementary materials for this article are available online.
AB - For multivariate nonparametric regression, functional analysis of variance (ANOVA) modeling aims to capture the relationship between a response and covariates by decomposing the unknown function into various components, representing main effects, two-way interactions, etc. Such an approach has been pursued explicitly in smoothing spline ANOVA modeling and implicitly in various greedy methods such as MARS. We develop a new method for functional ANOVA modeling, based on doubly penalized estimation using total-variation and empirical-norm penalties, to achieve sparse selection of component functions and their basis functions. For this purpose, we formulate a new class of hierarchical total variations, which measures total variations at different levels including main effects and multi-way interactions, possibly after some order of differentiation. Furthermore, we derive suitable basis functions for multivariate splines such that the hierarchical total variation can be represented as a regular Lasso penalty, and hence we extend a previous backfitting algorithm to handle doubly penalized estimation for ANOVA modeling. We present extensive numerical experiments on simulations and real data to compare our method with existing methods including MARS, tree boosting, and random forest. The results are very encouraging and demonstrate notable gains from our method in prediction or classification accuracy and simplicity of the fitted functions. Supplementary materials for this article are available online.
KW - ANOVA model
KW - Additive model
KW - Boosting
KW - Nonparametric regression
KW - Penalized estimation
KW - Total variation
UR - http://www.scopus.com/inward/record.url?scp=85108264765&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108264765&partnerID=8YFLogxK
U2 - 10.1080/10618600.2021.1923513
DO - 10.1080/10618600.2021.1923513
M3 - Article
AN - SCOPUS:85108264765
SN - 1061-8600
VL - 30
SP - 848
EP - 862
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 4
ER -