COG: Local decomposition for rare class analysis

Junjie Wu, Hui Xiong, Jian Chen

Research output: Contribution to journalArticlepeer-review

45 Scopus citations

Abstract

Given its importance, the problem of predicting rare classes in large-scale multi-labeled data sets has attracted great attention in the literature. However, rare class analysis remains a critical challenge, because there is no natural way developed for handling imbalanced class distributions. This paper thus fills this crucial void by developing a method for classification using local clustering (COG). Specifically, for a data set with an imbalanced class distribution, we perform clustering within each large class and produce sub-classes with relatively balanced sizes. Then, we apply traditional supervised learning algorithms, such as support vector machines (SVMs), for classification. Along this line, we explore key properties of local clustering for a better understanding of the effect of COG on rare class analysis. Also, we provide a systematic analysis of time and space complexity of the COG method. Indeed, the experimental results on various real-world data sets show that COG produces significantly higher prediction accuracies on rare classes than state-of-the-art methods and the COG scheme can greatly improve the computational performance of SVMs. Furthermore, we show that COG can also improve the performances of traditional supervised learning algorithms on data sets with balanced class distributions. Finally, as two case studies, we have applied COG for two real-world applications: credit card fraud detection and network intrusion detection.

Original languageEnglish (US)
Pages (from-to)191-220
Number of pages30
JournalData Mining and Knowledge Discovery
Volume20
Issue number2
DOIs
StatePublished - Mar 2010

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Keywords

  • K-means clustering
  • Local clustering
  • Rare class analysis
  • Support vector machines (SVMs)

Fingerprint

Dive into the research topics of 'COG: Local decomposition for rare class analysis'. Together they form a unique fingerprint.

Cite this