TY - GEN

T1 - Extended Boolean matrix decomposition

AU - Lu, Haibing

AU - Vaidya, Jaideep

AU - Atluri, Vijayalakshmi

AU - Hong, Yuan

PY - 2009

Y1 - 2009

N2 - With the vast increase in collection and storage of data, the problem of data summarization is most critical for effective data management. Since much of this data is categorical in nature, it can be viewed in terms of a Boolean matrix. Boolean matrix decomposition (BMD) has been used to provide concise and interpretable representations of Boolean data sets. A Boolean matrix can be expressed as a product of two Boolean matrices, where the first matrix represents a set of meaningful concepts, and the second describes how the observed data can be expressed as combinations of those concepts. Typically, the combination is only in terms of the set union. In other words, a successful Boolean matrix decomposition gives a set of concepts and shows how every column of the input data can be expressed as a union of some subset of those concepts. However, this way of modeling only incompletely represents real data semantics. Essentially, it ignores a critical component - the set difference operation: a column can be expressed as the combination of union of certain concepts as well as the exclusion of other concepts. This has two significant benefits. First, the total number of concepts required to describe the data may itself be reduced. Second, a more succinct summarization may be found for every column. In this paper, we propose the extended Boolean matrix decomposition (EBMD) problem, which aims to factor Boolean matrices using both the set union and set difference operations. We study several variants of the problem, show that they are NP-hard, and propose efficient heuristics to solve them. Extensive experimental results demonstrate the power of EBMD.

AB - With the vast increase in collection and storage of data, the problem of data summarization is most critical for effective data management. Since much of this data is categorical in nature, it can be viewed in terms of a Boolean matrix. Boolean matrix decomposition (BMD) has been used to provide concise and interpretable representations of Boolean data sets. A Boolean matrix can be expressed as a product of two Boolean matrices, where the first matrix represents a set of meaningful concepts, and the second describes how the observed data can be expressed as combinations of those concepts. Typically, the combination is only in terms of the set union. In other words, a successful Boolean matrix decomposition gives a set of concepts and shows how every column of the input data can be expressed as a union of some subset of those concepts. However, this way of modeling only incompletely represents real data semantics. Essentially, it ignores a critical component - the set difference operation: a column can be expressed as the combination of union of certain concepts as well as the exclusion of other concepts. This has two significant benefits. First, the total number of concepts required to describe the data may itself be reduced. Second, a more succinct summarization may be found for every column. In this paper, we propose the extended Boolean matrix decomposition (EBMD) problem, which aims to factor Boolean matrices using both the set union and set difference operations. We study several variants of the problem, show that they are NP-hard, and propose efficient heuristics to solve them. Extensive experimental results demonstrate the power of EBMD.

UR - http://www.scopus.com/inward/record.url?scp=77951154736&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951154736&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2009.61

DO - 10.1109/ICDM.2009.61

M3 - Conference contribution

AN - SCOPUS:77951154736

SN - 9780769538952

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 317

EP - 326

BT - ICDM 2009 - The 9th IEEE International Conference on Data Mining

T2 - 9th IEEE International Conference on Data Mining, ICDM 2009

Y2 - 6 December 2009 through 9 December 2009

ER -