b-bit minwise hashing for estimating three-way similarities?

Ping Li, Arnd Christian König, Wenhao Gui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Scopus citations

Abstract

Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 23
Subtitle of host publication24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010
StatePublished - Dec 1 2010
Externally publishedYes
Event24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010 - Vancouver, BC, Canada
Duration: Dec 6 2010Dec 9 2010

Publication series

NameAdvances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010

Other

Other24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010
CountryCanada
CityVancouver, BC
Period12/6/1012/9/10

All Science Journal Classification (ASJC) codes

  • Information Systems

Cite this

Li, P., König, A. C., & Gui, W. (2010). b-bit minwise hashing for estimating three-way similarities? In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010 (Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010).