Compact lexicon selection with spectral methods

Young Bum Kim, Karl Stratos, Xiaohu Liu, Ruhi Sarikaya

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

In this paper, we introduce the task of selecting compact lexicon from large, noisy gazetteers. This scenario arises often in practice, in particular spoken language understanding (SLU). We propose a simple and effective solution based on matrix decomposition techniques: canonical correlation analysis (CCA) and rank-revealing QR (RRQR) factorization. CCA is first used to derive low-dimensional gazetteer embeddings from domain-specific search logs. Then RRQR is used to find a subset of these embeddings whose span approximates the entire lexicon space. Experiments on slot tagging show that our method yields a small set of lexicon entities with average relative error reduction of > 50% over randomly selected lexicon.

Original languageEnglish (US)
Title of host publicationACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages806-811
Number of pages6
ISBN (Electronic)9781941643730
DOIs
StatePublished - 2015
Externally publishedYes
Event53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015 - Beijing, China
Duration: Jul 26 2015Jul 31 2015

Publication series

NameACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference
Volume2

Other

Other53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015
Country/TerritoryChina
CityBeijing
Period7/26/157/31/15

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Compact lexicon selection with spectral methods'. Together they form a unique fingerprint.

Cite this