A Sparse Singular Value Decomposition Method for High-Dimensional Data

Dan Yang, Zongming Ma, Andreas Buja

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

We present a new computational approach to approximating a large, noisy data table by a low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the singular vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways to estimate thresholding parameters, which obviate the need for computationally expensive cross-validation. We also introduce a way to sparsely initialize the algorithm for computational savings that allow our algorithm to outperform the vanilla singular value decomposition (SVD) on the full data table when the signal is sparse. A comparison with two existing sparse SVD methods suggests that our algorithm is computationally always faster and statistically always at least comparable to the better of the two competing algorithms. Supplementary materials for the article are available in an online appendix. An R package ssvd implementing the algorithms introduced in this article is available on CRAN.

Original languageEnglish (US)
Pages (from-to)923-942
Number of pages20
JournalJournal of Computational and Graphical Statistics
Volume23
Issue number4
DOIs
StatePublished - Oct 25 2014

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Discrete Mathematics and Combinatorics

Keywords

  • Cross-validation
  • Denoising
  • Low-rank matrix approximation
  • Penalization
  • Power iterations
  • Principal component analysis
  • Thresholding

Fingerprint

Dive into the research topics of 'A Sparse Singular Value Decomposition Method for High-Dimensional Data'. Together they form a unique fingerprint.

Cite this