Hierarchical multi-task word embedding learning for synonym prediction

Hongliang Fei, Shulong Tan, Ping Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Scopus citations

Abstract

Automatic synonym recognition is of great importance for entity-centric text mining and interpretation. Due to the high language use variability in real-life, manual construction of semantic resources to cover all synonyms is prohibitively expensive and may also result in limited coverage. Although there are public knowledge bases, they only have limited coverage for languages other than English. In this paper, we focus on medical domain and propose an automatic way to accelerate the process of medical synonymy resource development for Chinese, including both formal entities from healthcare professionals and noisy descriptions from end-users. Motivated by the success of distributed word representations, we design a multi-task model with hierarchical task relationship to learn more representative entity/term embeddings and apply them to synonym prediction. In our model, we extend the classical skip-gram word embedding model by introducing an auxiliary task “neighboring word semantic type prediction” and hierarchically organize them based on the task complexity. Meanwhile, we incorporate existing medical term-term synonymous knowledge into our word embedding learning framework. We demonstrate that the embeddings trained from our proposed multi-task model yield significant improvement for entity semantic relatedness evaluation, neighboring word semantic type prediction and synonym prediction compared with baselines. Furthermore, we create a large medical text corpus in Chinese that includes annotations for entities, descriptions and synonymous pairs for future research in this direction.

Original languageEnglish (US)
Title of host publicationKDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages834-842
Number of pages9
ISBN (Electronic)9781450362016
DOIs
StatePublished - Jul 25 2019
Event25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019 - Anchorage, United States
Duration: Aug 4 2019Aug 8 2019

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019
Country/TerritoryUnited States
CityAnchorage
Period8/4/198/8/19

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Hierarchical multi-task word embedding learning for synonym prediction'. Together they form a unique fingerprint.

Cite this