An alternative prior process for nonparametric Bayesian clustering

Hanna M. Wallach, Shane T. Jensen, Lee Dicker, Katherine A. Heller

Research output: Contribution to journalConference articlepeer-review

27 Scopus citations

Abstract

Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering-the uniform process-for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer ex-changeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.

Original languageEnglish (US)
Pages (from-to)892-899
Number of pages8
JournalJournal of Machine Learning Research
Volume9
StatePublished - 2010
Externally publishedYes
Event13th International Conference on Artificial Intelligence and Statistics, AISTATS 2010 - Sardinia, Italy
Duration: May 13 2010May 15 2010

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'An alternative prior process for nonparametric Bayesian clustering'. Together they form a unique fingerprint.

Cite this