TY - JOUR

T1 - Statistical learning theory for high dimensional prediction

T2 - Application to criterion-keyed scale development

AU - Chapman, Benjamin P.

AU - Weiss, Alexander

AU - Duberstein, Paul R.

N1 - Funding Information:
Work on this project was supported by NIH Grant R01AG044588 to the first author
Publisher Copyright:
© 2015 American Psychological Association.

PY - 2016/12/1

Y1 - 2016/12/1

N2 - Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression.

AB - Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression.

KW - Machine learning theory

KW - Mortality

KW - Personality

KW - Psychometrics

KW - Statistical learning theory

UR - http://www.scopus.com/inward/record.url?scp=84988692252&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988692252&partnerID=8YFLogxK

U2 - 10.1037/met0000088

DO - 10.1037/met0000088

M3 - Article

C2 - 27454257

AN - SCOPUS:84988692252

SN - 1082-989X

VL - 21

SP - 603

EP - 620

JO - Psychological Methods

JF - Psychological Methods

IS - 4

ER -