GPU-based minwise hashing

Ping Li, Anshumali Shrivastava, Arnd Christian König

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

Minwise hashing [1] is a standard technique for efficient set similarity estimation in the context of search. The recent work of b-bit minwise hashing [3] provided a substantial improvement by storing only the lowest b bits of each hashed value. Both minwise hashing and b-bit minwise hashing require an expensive preprocessing step for applying κ (e.g., κ = 500) permutations on the entire data in order to compute κ minimal values as the hashed data. In this paper, we developed a parallelization scheme using GPUs, which reduced the processing time by a factor of 20 ∼ 80. Reducing the preprocessing time is highly beneficial in practice, for example, for duplicate web page detection (where minwise hashing is a major step in the crawling pipeline) or for increasing the testing speed of online classifiers (when the test data are not preprocessed). Copyright is held by the author/owner(s).

Original languageEnglish (US)
Title of host publicationWWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion
Pages565-566
Number of pages2
DOIs
StatePublished - May 21 2012
Externally publishedYes
Event21st Annual Conference on World Wide Web, WWW'12 - Lyon, France
Duration: Apr 16 2012Apr 20 2012

Publication series

NameWWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion

Other

Other21st Annual Conference on World Wide Web, WWW'12
CountryFrance
CityLyon
Period4/16/124/20/12

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Keywords

  • GPU
  • Hashing
  • Large-scale learning

Fingerprint Dive into the research topics of 'GPU-based minwise hashing'. Together they form a unique fingerprint.

  • Cite this

    Li, P., Shrivastava, A., & König, A. C. (2012). GPU-based minwise hashing. In WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion (pp. 565-566). (WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion). https://doi.org/10.1145/2187980.2188129