Medical concept embeddings via labeled background corpora

Eneldo Loza Mencía, Gerard De Melo, Jinseok Nam

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditorsNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
PublisherEuropean Language Resources Association (ELRA)
Pages4629-4636
Number of pages8
ISBN (Electronic)9782951740891
StatePublished - Jan 1 2016
Externally publishedYes
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: May 23 2016May 28 2016

Publication series

NameProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

Other

Other10th International Conference on Language Resources and Evaluation, LREC 2016
CountrySlovenia
CityPortoroz
Period5/23/165/28/16

All Science Journal Classification (ASJC) codes

  • Linguistics and Language
  • Library and Information Sciences
  • Language and Linguistics
  • Education

Keywords

  • Embeddings
  • Medical concepts
  • Mesh
  • Semantic similarity

Fingerprint Dive into the research topics of 'Medical concept embeddings via labeled background corpora'. Together they form a unique fingerprint.

Cite this