## Abstract

An idealized clustering algorithm seeks to learn a cluster-adjacency matrix such that, if two data points belong to the same cluster, the corresponding entry would be 1; otherwise, the entry would be 0. This integer (1/0) constraint makes it difficult to find the optimal solution. We propose a relaxation on the cluster-adjacency matrix, by deriving a bi-stochastic matrix from a data similarity (e. g., kernel) matrix according to the Bregman divergence. Our general method is named the Bregmanian Bi-Stochastication (BBS) algorithm. We focus on two popular choices of the Bregman divergence: the Euclidean distance and the Kullback-Leibler (KL) divergence. Interestingly, the BBS algorithm using the KL divergence is equivalent to the Sinkhorn-Knopp (SK) algorithm for deriving bi-stochastic matrices. We show that the BBS algorithm using the Euclidean distance is closely related to the relaxed k-means clustering and can often produce noticeably superior clustering results to the SK algorithm (and other algorithms such as Normalized Cut), through extensive experiments on public data sets.

Original language | English (US) |
---|---|

Pages (from-to) | 351-382 |

Number of pages | 32 |

Journal | Knowledge and Information Systems |

Volume | 32 |

Issue number | 2 |

DOIs | |

State | Published - Aug 2012 |

Externally published | Yes |

## All Science Journal Classification (ASJC) codes

- Software
- Information Systems
- Human-Computer Interaction
- Hardware and Architecture
- Artificial Intelligence

## Keywords

- Bi-stochastic matrix
- Bregman divergence
- Clustering