b-bit minwise hashing for estimating three-way similarities?

Ping Li, Arnd Christian König, Wenhao Gui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

23 Citations (Scopus)

Abstract

Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 23
Subtitle of host publication24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010
StatePublished - Dec 1 2010
Event24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010 - Vancouver, BC, Canada
Duration: Dec 6 2010Dec 9 2010

Publication series

NameAdvances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010

Other

Other24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010
CountryCanada
CityVancouver, BC
Period12/6/1012/9/10

All Science Journal Classification (ASJC) codes

  • Information Systems

Cite this

Li, P., König, A. C., & Gui, W. (2010). b-bit minwise hashing for estimating three-way similarities? In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010 (Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010).
Li, Ping ; König, Arnd Christian ; Gui, Wenhao. / b-bit minwise hashing for estimating three-way similarities?. Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010. 2010. (Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010).
@inproceedings{63718b3917394f09bd8245c1805afb0f,
title = "b-bit minwise hashing for estimating three-way similarities?",
abstract = "Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance.",
author = "Ping Li and K{\"o}nig, {Arnd Christian} and Wenhao Gui",
year = "2010",
month = "12",
day = "1",
language = "English (US)",
isbn = "9781617823800",
series = "Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010",
booktitle = "Advances in Neural Information Processing Systems 23",

}

Li, P, König, AC & Gui, W 2010, b-bit minwise hashing for estimating three-way similarities? in Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010. Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010, 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010, Vancouver, BC, Canada, 12/6/10.

b-bit minwise hashing for estimating three-way similarities? / Li, Ping; König, Arnd Christian; Gui, Wenhao.

Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010. 2010. (Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - b-bit minwise hashing for estimating three-way similarities?

AU - Li, Ping

AU - König, Arnd Christian

AU - Gui, Wenhao

PY - 2010/12/1

Y1 - 2010/12/1

N2 - Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance.

AB - Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance.

UR - http://www.scopus.com/inward/record.url?scp=84860628964&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860628964&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84860628964

SN - 9781617823800

T3 - Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010

BT - Advances in Neural Information Processing Systems 23

ER -

Li P, König AC, Gui W. b-bit minwise hashing for estimating three-way similarities? In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010. 2010. (Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010).