TY - GEN

T1 - Advantage of overlapping clusters for minimizing conductance

AU - Khandekar, Rohit

AU - Kortsarz, Guy

AU - Mirrokni, Vahab

PY - 2012

Y1 - 2012

N2 - Graph clustering is an important problem with applications to bioinformatics, community discovery in social networks, distributed computing, etc. While most of the research in this area has focused on clustering using disjoint clusters, many real datasets have inherently overlapping clusters. We compare overlapping and non-overlapping clusterings in graphs in the context of minimizing their conductance. It is known that allowing clusters to overlap gives better results in practice. We prove that overlapping clustering may be significantly better than non-overlapping clustering with respect to conductance, even in a theoretical setting. For minimizing the maximum conductance over the clusters, we give examples demonstrating that allowing overlaps can yield significantly better clusterings, namely, one that has much smaller optimum. In addition for the min-max variant, the overlapping version admits a simple approximation algorithm, while our algorithm for the non-overlapping version is complex and yields worse approximation ratio due to the presence of the additional constraint. Somewhat surprisingly, for the problem of minimizing the sum of conductances, we found out that allowing overlap does not really help. We show how to apply a general technique to transform any overlapping clustering into a non-overlapping one with only a modest increase in the sum of conductances. This uncrossing technique is of independent interest and may find further applications in the future.

AB - Graph clustering is an important problem with applications to bioinformatics, community discovery in social networks, distributed computing, etc. While most of the research in this area has focused on clustering using disjoint clusters, many real datasets have inherently overlapping clusters. We compare overlapping and non-overlapping clusterings in graphs in the context of minimizing their conductance. It is known that allowing clusters to overlap gives better results in practice. We prove that overlapping clustering may be significantly better than non-overlapping clustering with respect to conductance, even in a theoretical setting. For minimizing the maximum conductance over the clusters, we give examples demonstrating that allowing overlaps can yield significantly better clusterings, namely, one that has much smaller optimum. In addition for the min-max variant, the overlapping version admits a simple approximation algorithm, while our algorithm for the non-overlapping version is complex and yields worse approximation ratio due to the presence of the additional constraint. Somewhat surprisingly, for the problem of minimizing the sum of conductances, we found out that allowing overlap does not really help. We show how to apply a general technique to transform any overlapping clustering into a non-overlapping one with only a modest increase in the sum of conductances. This uncrossing technique is of independent interest and may find further applications in the future.

KW - dynamic programming

KW - graph clustering

KW - overlapping clustering

KW - tree decomposition

UR - http://www.scopus.com/inward/record.url?scp=84860833067&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860833067&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-29344-3_42

DO - 10.1007/978-3-642-29344-3_42

M3 - Conference contribution

AN - SCOPUS:84860833067

SN - 9783642293436

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 494

EP - 505

BT - LATIN 2012

PB - Springer Verlag

T2 - 10th Latin American Symposiumon Theoretical Informatics, LATIN 2012

Y2 - 16 April 2012 through 20 April 2012

ER -