A sub-character architecture for Korean language processing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

We introduce a novel sub-character architecture that exploits a unique compositional structure of the Korean language. Our method decomposes each character into a small set of primitive phonetic units called jamo letters from which character- and word-level representations are induced. The jamo letters divulge syntactic and semantic information that is difficult to access with conventional character-level units. They greatly alleviate the data sparsity problem, reducing the observation space to 1.6% of the original while increasing accuracy in our experiments. We apply our architecture to dependency parsing and achieve dramatic improvement over strong lexical baselines.

Original languageEnglish (US)
Title of host publicationEMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages721-726
Number of pages6
ISBN (Electronic)9781945626838
DOIs
StatePublished - 2017
Externally publishedYes
Event2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017 - Copenhagen, Denmark
Duration: Sep 9 2017Sep 11 2017

Publication series

NameEMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017
Country/TerritoryDenmark
CityCopenhagen
Period9/9/179/11/17

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Information Systems
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'A sub-character architecture for Korean language processing'. Together they form a unique fingerprint.

Cite this