Hypergraph partitioning for document clustering: A unified clique perspective

Hu Tianming, Xiong Hui, Zhou Wenjun, Yuan Sung Sam, Luo Hangzai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

Hypergraph partitioning has been considered as a promising method to address the challenges of high dimensionality in document clustering. With documents modeled as vertices and the relationship among documents captured by the hyperedges, the goal of graph partitioning is to minimize the edge cut. Therefore, the definition of hyperedges is vital to the clustering performance. While several definitions of hyperedges have been proposed, a systematic understanding of desired characteristics of hyperedges is still missing. To that end, in this paper, we first provide a unified clique perspective of the definition of hyperedges, which serves as a guide to define hyperedges. With this perspective, based on the concepts of hypercliques and shared (reverse) nearest neighbors, we propose three new types of clique hyperedges and analyze their properties regarding purity and size issues. Finally, we present an extensive evaluation using real-world document datasets. The experimental results show that, with shared (reverse) nearest neighbor based hyperedges, the clustering performance can be improved significantly in terms of various external validation measures without the need for fine tuning of parameters.

Original languageEnglish (US)
Title of host publicationACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings
Pages871-872
Number of pages2
DOIs
StatePublished - 2008
Externally publishedYes
Event31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008 - Singapore, Singapore
Duration: Jul 20 2008Jul 24 2008

Publication series

NameACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings

Other

Other31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008
Country/TerritorySingapore
CitySingapore
Period7/20/087/24/08

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software

Keywords

  • Clique
  • Document clustering
  • Hypergraph partitioning
  • Shared nearest neighbor

Fingerprint

Dive into the research topics of 'Hypergraph partitioning for document clustering: A unified clique perspective'. Together they form a unique fingerprint.

Cite this