Inducing conceptual embedding spaces from wikipedia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

The word2vec word vector representations are one of the most well-known new semantic resources to appear in recent years. While large sets of pre-trained vectors are available, these focus on frequent words and multi-word expressions but lack sufficient coverage of named entities. Moreover, Google only released pre-trained vectors for English. In this paper, we explore an automatic expansion of Google's pre-trained vectors using Wikipedia, adding millions of concepts and named entities in over 270 languages. Our method enables all of these to reside in the same vector space, thus flexibly facilitating cross-lingual semantic applications.

Original languageEnglish (US)
Title of host publication26th International World Wide Web Conference 2017, WWW 2017 Companion
PublisherInternational World Wide Web Conferences Steering Committee
Pages43-50
Number of pages8
ISBN (Electronic)9781450349147
DOIs
StatePublished - 2017
Event26th International World Wide Web Conference, WWW 2017 Companion - Perth, Australia
Duration: Apr 3 2017Apr 7 2017

Publication series

Name26th International World Wide Web Conference 2017, WWW 2017 Companion

Other

Other26th International World Wide Web Conference, WWW 2017 Companion
Country/TerritoryAustralia
CityPerth
Period4/3/174/7/17

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Keywords

  • Conceptual knowledge
  • Semantic representations
  • Wikipedia

Fingerprint

Dive into the research topics of 'Inducing conceptual embedding spaces from wikipedia'. Together they form a unique fingerprint.

Cite this