TY - GEN

T1 - Very sparse random projections

AU - Li, Ping

AU - Hastie, Trevor J.

AU - Church, Kenneth W.

PY - 2006

Y1 - 2006

N2 - There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A ∈ ℝn × D be our n points in D dimensions. The method multiplies A by a random matrix R ∈ ℝ D × k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0, 1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0, 1) entries in R with entries in {-1, 0, 1} with probabilities {1/6, 2/3, 1/6}, achieving a threefold speedup in processing time. We recommend using R of entries in {-1, 0, 1} with probabilities {1/2√D, 1 - 1/√D, 1/2√D} for achieving a significant √D-fold speedup, with little loss in accuracy.

AB - There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A ∈ ℝn × D be our n points in D dimensions. The method multiplies A by a random matrix R ∈ ℝ D × k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0, 1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0, 1) entries in R with entries in {-1, 0, 1} with probabilities {1/6, 2/3, 1/6}, achieving a threefold speedup in processing time. We recommend using R of entries in {-1, 0, 1} with probabilities {1/2√D, 1 - 1/√D, 1/2√D} for achieving a significant √D-fold speedup, with little loss in accuracy.

KW - Random projections

KW - Rates of convergence

KW - Sampling

UR - http://www.scopus.com/inward/record.url?scp=33749573641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749573641&partnerID=8YFLogxK

U2 - 10.1145/1150402.1150436

DO - 10.1145/1150402.1150436

M3 - Conference contribution

AN - SCOPUS:33749573641

SN - 1595933395

SN - 9781595933393

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 287

EP - 296

BT - KDD 2006

PB - Association for Computing Machinery (ACM)

T2 - KDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Y2 - 20 August 2006 through 23 August 2006

ER -