A generalization of proximity functions for K-means

Junjie Wu, Hui Xiong, Jian Chen, Wenjun Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

25 Scopus citations

Abstract

K-means is a widely used partitional clustering method. A large amount of effort has been made on finding better proximity (distance) functions for K-means. However, the common characteristics of proximity functions remain unknown. To this end, in this paper, we show that all proximity functions that fit K-means clustering can be generalized as K-means distance, which can be derived by a differen-tiable convex function. A general proof of sufficient and necessary conditions for K-means distance functions is also provided. In addition, we reveal that K-means has a general uniformization effect; that is, K-means tends to produce clusters with relatively balanced cluster sizes. This uniformization effect of K-means exists regardless of proximity functions. Finally, we have conducted extensive experiments on various real-world data sets, and the results show the evidence of the uniformization effect. Also, we observed that external clustering validation measures, such as Entropy and Variance of Information (VI), have difficulty in measuring clustering quality if data have skewed distributions on class sizes.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007
Pages361-370
Number of pages10
DOIs
StatePublished - 2007
Event7th IEEE International Conference on Data Mining, ICDM 2007 - Omaha, NE, United States
Duration: Oct 28 2007Oct 31 2007

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other7th IEEE International Conference on Data Mining, ICDM 2007
Country/TerritoryUnited States
CityOmaha, NE
Period10/28/0710/31/07

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Fingerprint

Dive into the research topics of 'A generalization of proximity functions for K-means'. Together they form a unique fingerprint.

Cite this