Efficient document clustering via online nonnegative matrix factorizations

Fei Wang, Chenhao Tan, Arnd Christian König, Ping Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

50 Scopus citations

Abstract

In recent years, Nonnegative Matrix Factorization (NMF) has received considerable interest from the data mining and information retrieval fields. NMF has been successfully applied in document clustering, image representation, and other domains. This study proposes an online NMF (ONMF) algorithm to efficiently handle very large-scale and/or streaming datasets. Unlike conventional NMF solutions which require the entire data matrix to reside in the memory, our ONMF algorithm proceeds with one data point or one chunk of data points at a time. Experiments with one-pass and multi-pass ONMF on real datasets are presented.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th SIAM International Conference on Data Mining, SDM 2011
PublisherSociety for Industrial and Applied Mathematics Publications
Pages908-919
Number of pages12
ISBN (Print)9780898719925
DOIs
StatePublished - 2011
Externally publishedYes
Event11th SIAM International Conference on Data Mining, SDM 2011 - Mesa, AZ, United States
Duration: Apr 28 2011Apr 30 2011

Publication series

NameProceedings of the 11th SIAM International Conference on Data Mining, SDM 2011

Other

Other11th SIAM International Conference on Data Mining, SDM 2011
CountryUnited States
CityMesa, AZ
Period4/28/114/30/11

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint Dive into the research topics of 'Efficient document clustering via online nonnegative matrix factorizations'. Together they form a unique fingerprint.

Cite this