A Framework for pre-training hidden-unit conditional random fields and its extension to long short term memory networks

Young Bum Kim, Karl Stratos, Ruhi Sarikaya

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

In this paper, we introduce a simple unsupervised framework for pre-training hidden-unit conditional random fields (HUCRFs), i.e., learning initial parameter estimates for HUCRFs prior to supervised training.Our framework exploits the model structure of HUCRFs to make effective use of unlabeled data from the same domain or labeled data from a different domain. The key idea is to use the separation of HUCRF parameters between observations and labels: this allows us to pre-train observation parameters independently of label parameters. Pre-training is achieved by creating pseudo-labels from such resources. In the case of unlabeled data, we cluster observations and use the resulting clusters as pseudo-labels. Observation parameters can be trained on these resources and then transferred to initialize the supervised training process on the target labeled data. Experiments on various sequence labeling tasks demonstrate that the proposed pre-training method consistently yields significant improvement in performance. The core idea could be extended to other learning techniques including deep learning. We applied the proposed technique to recurrent neural networks (RNN) with long short term memory (LSTM) architecture and obtained similar gains.

Original languageEnglish (US)
Pages (from-to)311-326
Number of pages16
JournalComputer Speech and Language
Volume46
DOIs
StatePublished - Nov 2017
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Keywords

  • Conditional random fiends
  • Hidden unit conditional random fields
  • LSTMs
  • Multi-sense clustering
  • Pre-training
  • Sequence labeling
  • Spoken language understanding
  • Transfer learning
  • Word embedding

Fingerprint Dive into the research topics of 'A Framework for pre-training hidden-unit conditional random fields and its extension to long short term memory networks'. Together they form a unique fingerprint.

Cite this