TY - GEN
T1 - Sign-full random projections
AU - Li, Ping
N1 - Publisher Copyright:
© 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org).
PY - 2019
Y1 - 2019
N2 - The method of 1-bit (“sign-sign”) random projections has been a popular tool for efficient search and machine learning on large datasets. Given two D-dim data vectors u, v ? RD, one can generate x = PDi=1 uiri, and y = PDi=1 viri, where ri ~ N(0, 1) iid. Then one can estimate the cosine similarity ? from sgn(x) and sgn(y). In this paper, we study a series of estimators for “sign-full” random projections. First we prove E(sgn(x)y) = qp2 ?, which provides an estimator for ?. Interestingly this estimator can be substantially improved by normalizing y. Then we study estimators based on E (y-1x=0 + y+1x<0) and its normalized version. We analyze the theoretical limit (using the MLE) and conclude that, among the proposed estimators, no single estimator can achieve (close to) the theoretical optimal asymptotic variance, for the entire range of ?. On the other hand, the estimators can be combined to achieve the variance close to that of the MLE. In applications such as near neighbor search, duplicate detection, knn-classification, etc, the training data are first transformed via random projections and then only the signs of the projected data points are stored (i.e., the sgn(x)). The original training data are discarded. When a new data point arrives, we apply random projections but we do not necessarily need to quantize the projected data (i.e., the y) to 1-bit. Therefore, sign-full random projections can be practically useful. This gain essentially comes at no additional cost.
AB - The method of 1-bit (“sign-sign”) random projections has been a popular tool for efficient search and machine learning on large datasets. Given two D-dim data vectors u, v ? RD, one can generate x = PDi=1 uiri, and y = PDi=1 viri, where ri ~ N(0, 1) iid. Then one can estimate the cosine similarity ? from sgn(x) and sgn(y). In this paper, we study a series of estimators for “sign-full” random projections. First we prove E(sgn(x)y) = qp2 ?, which provides an estimator for ?. Interestingly this estimator can be substantially improved by normalizing y. Then we study estimators based on E (y-1x=0 + y+1x<0) and its normalized version. We analyze the theoretical limit (using the MLE) and conclude that, among the proposed estimators, no single estimator can achieve (close to) the theoretical optimal asymptotic variance, for the entire range of ?. On the other hand, the estimators can be combined to achieve the variance close to that of the MLE. In applications such as near neighbor search, duplicate detection, knn-classification, etc, the training data are first transformed via random projections and then only the signs of the projected data points are stored (i.e., the sgn(x)). The original training data are discarded. When a new data point arrives, we apply random projections but we do not necessarily need to quantize the projected data (i.e., the y) to 1-bit. Therefore, sign-full random projections can be practically useful. This gain essentially comes at no additional cost.
UR - http://www.scopus.com/inward/record.url?scp=85084807688&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084807688&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85084807688
T3 - 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019
SP - 4205
EP - 4212
BT - 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019
PB - AAAI press
T2 - 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Annual Conference on Innovative Applications of Artificial Intelligence, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019
Y2 - 27 January 2019 through 1 February 2019
ER -