Improved densification of one permutation hashing

Anshumali Shrivastava, Ping Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

22 Scopus citations

Abstract

The existing work on densification of one permutation hashing [24] reduces the query processing cost of the (K,L)-parameterized Locality Sensitive Hashing (LSH) algorithm with minwise hashing, from O(dKL) to merely O(d + KL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. While that is a substantial improvement, our analysis reveals that the existing densification scheme in [24] is sub-optimal. In particular, there is no enough randomness in that procedure, which affects its accuracy on very sparse datasets. In this paper, we provide a new densification procedure which is provably better than the existing scheme [24]. This improvement is more significant for very sparse datasets which are common over the web. The improved technique has the same cost of O(d + KL) for query processing, thereby making it strictly preferable over the existing procedure. Experimental evaluations on public datasets, in the task of hashing based near neighbor search, support our theoretical findings.

Original languageEnglish (US)
Title of host publicationUncertainty in Artificial Intelligence - Proceedings of the 30th Conference, UAI 2014
EditorsNevin L. Zhang, Jin Tian
PublisherAUAI Press
Pages732-741
Number of pages10
ISBN (Electronic)9780974903910
StatePublished - Jan 1 2014
Event30th Conference on Uncertainty in Artificial Intelligence, UAI 2014 - Quebec City, Canada
Duration: Jul 23 2014Jul 27 2014

Publication series

NameUncertainty in Artificial Intelligence - Proceedings of the 30th Conference, UAI 2014

Other

Other30th Conference on Uncertainty in Artificial Intelligence, UAI 2014
CountryCanada
CityQuebec City
Period7/23/147/27/14

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Improved densification of one permutation hashing'. Together they form a unique fingerprint.

Cite this