Variable selection for high dimensional multivariate outcomes

Tamar Sofer, Lee Dicker, Xihong Lin

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

We consider variable selection for high-dimensional multivariate regression using penalized likelihoods when the number of outcomes and the number of covariates might be large. To account for within-subject correlation, we consider variable selection when a working precision matrix is used and when the precision matrix is jointly estimated using a two-stage procedure. We show that under suitable regularity conditions, penalized regression coefficient estimators are consistent for model selection for an arbitrary working precision matrix, and have the oracle properties and are efficient when the true precision matrix is used or when it is consistently estimated using sparse regression. We develop an efficient computation procedure for estimating regression coefficients using the coordinate descent algorithm in conjunction with sparse precision matrix estimation using the graphical LASSO (GLASSO) algorithm. We develop the Bayesian Information Criterion (BIC) for estimating the tuning parameter and show that BIC is consistent for model selection. We evaluate finite sample performance for the proposed method using simulation studies and illustrate its application using the type II diabetes gene expression pathway data.

Original languageEnglish (US)
Pages (from-to)1633-1654
Number of pages22
JournalStatistica Sinica
Volume24
Issue number4
DOIs
StatePublished - Oct 2014

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • BIC
  • Consistency
  • Correlation
  • Efficiency
  • Model selection
  • Multiple outcomes
  • Oracle estimator

Fingerprint

Dive into the research topics of 'Variable selection for high dimensional multivariate outcomes'. Together they form a unique fingerprint.

Cite this