Exploiting associations between word clusters and document classes for cross-domain text categorization

Fuzhen Zhuang, Ping Luo, Hui Xiong, Qing He, Yuhong Xiong, Zhongzhi Shi

Research output: Contribution to conferencePaperpeer-review

32 Scopus citations

Abstract

Cross-domain text categorization targets on adapting the knowledge learnt from a labeled source-domain to an unla-beled target-domain, where the documents from the source and target domains are drawn from different distributions. However, in spite of the different distributions in raw word features, the associations between word clusters (conceptual features) and document classes may remain stable across different domains. In this paper, we exploit these unchanged associations as the bridge of knowledge transformation from the source domain to the target domain by the nonnegative matrix tri-factorization. Specifically, we formulate a joint optimization framework of the two matrix tri-factorizations for the source and target domain data respectively, in which the associations between word clusters and document classes are shared between them. Then, we give an iterative algorithm for this optimization and theoretically show its convergence. The comprehensive experiments show the effectiveness of this method. In particular, we show that the proposed method can deal with some difficult scenarios where baseline methods usually do not perform well.

Original languageEnglish (US)
Pages13-24
Number of pages12
DOIs
StatePublished - 2010
Externally publishedYes
Event10th SIAM International Conference on Data Mining, SDM 2010 - Columbus, OH, United States
Duration: Apr 29 2010May 1 2010

Other

Other10th SIAM International Conference on Data Mining, SDM 2010
Country/TerritoryUnited States
CityColumbus, OH
Period4/29/105/1/10

All Science Journal Classification (ASJC) codes

  • Software

Keywords

  • Cross-domain learning
  • Domain adaption
  • Text categorization
  • Transfer learning

Fingerprint

Dive into the research topics of 'Exploiting associations between word clusters and document classes for cross-domain text categorization'. Together they form a unique fingerprint.

Cite this