TY - GEN

T1 - Nonlinear estimators and tail bounds for dimension reduction in l 1 using cauchy random projections

AU - Li, Ping

AU - Hastie, Trevor J.

AU - Church, Kenneth W.

PY - 2007

Y1 - 2007

N2 - For dimension reduction in l1, one can multiply a data matrix A ε ℝn×D by R ε ℝD×k (k ≪ D) whose entries are i.i.d. samples of Cauchy. The impossibility result says one can not recover the pairwise l1 distances in A from B = AR εℝn×k, using linear estimators. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. We derive tail bounds for the geometric mean estimator and establish that k = O (&logn/ε2) suffices with the constants explicitly given. Asymptotically (as k → ∞), both the sample median estimator and the geometric mean estimator are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.

AB - For dimension reduction in l1, one can multiply a data matrix A ε ℝn×D by R ε ℝD×k (k ≪ D) whose entries are i.i.d. samples of Cauchy. The impossibility result says one can not recover the pairwise l1 distances in A from B = AR εℝn×k, using linear estimators. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. We derive tail bounds for the geometric mean estimator and establish that k = O (&logn/ε2) suffices with the constants explicitly given. Asymptotically (as k → ∞), both the sample median estimator and the geometric mean estimator are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.

UR - http://www.scopus.com/inward/record.url?scp=38049003198&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38049003198&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-72927-3_37

DO - 10.1007/978-3-540-72927-3_37

M3 - Conference contribution

AN - SCOPUS:38049003198

SN - 9783540729259

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 514

EP - 529

BT - Learning Theory - 20th Annual Conference on Learning Theory, COLT 2007, Proceedings

PB - Springer Verlag

T2 - 20th Annual Conference on Learning Theory, COLT 2007

Y2 - 13 June 2007 through 15 June 2007

ER -