Improving clustering by learning a bi-stochastic data similarity matrix

Fei Wang, Ping Li, Arnd Christian König, Muting Wan

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

An idealized clustering algorithm seeks to learn a cluster-adjacency matrix such that, if two data points belong to the same cluster, the corresponding entry would be 1; otherwise, the entry would be 0. This integer (1/0) constraint makes it difficult to find the optimal solution. We propose a relaxation on the cluster-adjacency matrix, by deriving a bi-stochastic matrix from a data similarity (e. g., kernel) matrix according to the Bregman divergence. Our general method is named the Bregmanian Bi-Stochastication (BBS) algorithm. We focus on two popular choices of the Bregman divergence: the Euclidean distance and the Kullback-Leibler (KL) divergence. Interestingly, the BBS algorithm using the KL divergence is equivalent to the Sinkhorn-Knopp (SK) algorithm for deriving bi-stochastic matrices. We show that the BBS algorithm using the Euclidean distance is closely related to the relaxed k-means clustering and can often produce noticeably superior clustering results to the SK algorithm (and other algorithms such as Normalized Cut), through extensive experiments on public data sets.

Original languageEnglish (US)
Pages (from-to)351-382
Number of pages32
JournalKnowledge and Information Systems
Volume32
Issue number2
DOIs
StatePublished - Aug 2012
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Keywords

  • Bi-stochastic matrix
  • Bregman divergence
  • Clustering

Fingerprint Dive into the research topics of 'Improving clustering by learning a bi-stochastic data similarity matrix'. Together they form a unique fingerprint.

Cite this