TY - GEN
T1 - Rejection Sampling for Weighted Jaccard Similarity Revisited
AU - Li, Xiaoyun
AU - Li, Ping
N1 - Publisher Copyright:
Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2021
Y1 - 2021
N2 - Efficiently1 computing the weighted Jaccard similarity has become an active research topic in machine learning and theory. For sparse data, the standard technique is based on the consistent weighed sampling (CWS). For dense data, however, methods based on rejection sampling (RS) can be much more efficient. Nevertheless, existing RS methods are still slow for practical purposes. In this paper, we propose to improve RS by a strategy, which we call efficient rejection sampling (ERS), based on “early stopping + densification”. We analyze the statistical property of ERS and provide experimental results to compare ERS with RS and other algorithms for hashing weighted Jaccard. The results demonstrate that ERS significantly improves the existing methods for estimating the weighted Jaccard similarity in relatively dense data.
AB - Efficiently1 computing the weighted Jaccard similarity has become an active research topic in machine learning and theory. For sparse data, the standard technique is based on the consistent weighed sampling (CWS). For dense data, however, methods based on rejection sampling (RS) can be much more efficient. Nevertheless, existing RS methods are still slow for practical purposes. In this paper, we propose to improve RS by a strategy, which we call efficient rejection sampling (ERS), based on “early stopping + densification”. We analyze the statistical property of ERS and provide experimental results to compare ERS with RS and other algorithms for hashing weighted Jaccard. The results demonstrate that ERS significantly improves the existing methods for estimating the weighted Jaccard similarity in relatively dense data.
UR - http://www.scopus.com/inward/record.url?scp=85107943566&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107943566&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85107943566
T3 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
SP - 4197
EP - 4205
BT - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
PB - Association for the Advancement of Artificial Intelligence
T2 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
Y2 - 2 February 2021 through 9 February 2021
ER -