Simple semi-supervised POS tagging

Karl Stratos, Michael Collins

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

We tackle the question: how much supervision is needed to achieve state-of-the-art performance in part-of-speech (POS) tagging, if we leverage lexical representations given by the model of Brown et al. (1992)? It has become a standard practice to use automatically induced “Brown clusters” in place of POS tags. We claim that the underlying sequence model for these clusters is particularly well-suited for capturing POS tags. We empirically demonstrate this claim by drastically reducing supervision in POS tagging with these representations. Using either the bit-string form given by the algorithm of Brown et al. (1992) or the (less well-known) embedding form given by the canonical correlation analysis algorithm of Stratos et al. (2014), we can obtain 93% tagging accuracy with just 400 labeled words and achieve state-of-the-art accuracy (> 97%) with less than 1 percent of the original training data.

Original languageEnglish (US)
Title of host publication1st Workshop on Vector Space Modeling for Natural Language Processing, VS 2015 at the Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, NAACL-HLT 2015
EditorsPhil Blunsom, Shay Cohen, Paramveer Dhillon, Percy Liang
PublisherAssociation for Computational Linguistics (ACL)
Pages79-87
Number of pages9
ISBN (Electronic)9781941643464
DOIs
StatePublished - 2015
Externally publishedYes
Event1st Workshop on Vector Space Modeling for Natural Language Processing, VS 2015 at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2015 - Denver, United States
Duration: Jun 5 2015 → …

Publication series

Name1st Workshop on Vector Space Modeling for Natural Language Processing, VS 2015 at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2015

Conference

Conference1st Workshop on Vector Space Modeling for Natural Language Processing, VS 2015 at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2015
Country/TerritoryUnited States
CityDenver
Period6/5/15 → …

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Simple semi-supervised POS tagging'. Together they form a unique fingerprint.

Cite this