TY - JOUR

T1 - A lava attack on the recovery of sums of dense and sparse signals

AU - Chernozhukov, Victor

AU - Hansen, Christian

AU - Liao, Yuan

N1 - Funding Information:
Supported in part by University of Chicago Booth School of Business and The Wallace W. Booth Professorship.
Publisher Copyright:
© Institute of Mathematical Statistics, 2017.

PY - 2017/2

Y1 - 2017/2

N2 - Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of nonzero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small nonzero parameters. We consider a generalization of these two basic models, termed here a "sparse + dense" model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation.We propose a new penalization-based method, called lava, which is computationally efficient.With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide a deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein's unbiased estimator for lava's prediction risk. A simulation example compares the performance of lava to lasso, ridge and elastic net in a regression example using data-dependent penalty parameters and illustrates lava's improved performance relative to these benchmarks.

AB - Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of nonzero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small nonzero parameters. We consider a generalization of these two basic models, termed here a "sparse + dense" model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation.We propose a new penalization-based method, called lava, which is computationally efficient.With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide a deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein's unbiased estimator for lava's prediction risk. A simulation example compares the performance of lava to lasso, ridge and elastic net in a regression example using data-dependent penalty parameters and illustrates lava's improved performance relative to these benchmarks.

KW - High-dimensional models

KW - Nonsparse signal recovery

KW - Penalization

KW - Shrinkage

UR - http://www.scopus.com/inward/record.url?scp=85015067263&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015067263&partnerID=8YFLogxK

U2 - 10.1214/16-AOS1434

DO - 10.1214/16-AOS1434

M3 - Article

AN - SCOPUS:85015067263

VL - 45

SP - 39

EP - 76

JO - Annals of Statistics

JF - Annals of Statistics

SN - 0090-5364

IS - 1

ER -