MENTA: Inducing multilingual taxonomies from Wikipedia

Gerard De Melo, Gerhard Weikum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

69 Citations (Scopus)

Abstract

In recent years, a number of projects have turned to Wikipedia to establish large-scale taxonomies that describe orders of magnitude more entities than traditional manually built knowledge bases. So far, however, the multilingual nature of Wikipedia has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is presumably the largest multilingual lexical knowledge base currently available.

Original languageEnglish (US)
Title of host publicationCIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
Pages1099-1108
Number of pages10
DOIs
StatePublished - Dec 1 2010
Event19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10 - Toronto, ON, Canada
Duration: Oct 26 2010Oct 30 2010

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
CountryCanada
CityToronto, ON
Period10/26/1010/30/10

Fingerprint

Taxonomy
Wikipedia
Knowledge base
Heuristics
Equivalence
Markov chain
WordNet
Integrated
Graph
Partitioning
Ranking
Resources

All Science Journal Classification (ASJC) codes

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Keywords

  • Algorithms

Cite this

De Melo, G., & Weikum, G. (2010). MENTA: Inducing multilingual taxonomies from Wikipedia. In CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops (pp. 1099-1108). (International Conference on Information and Knowledge Management, Proceedings). https://doi.org/10.1145/1871437.1871577
De Melo, Gerard ; Weikum, Gerhard. / MENTA : Inducing multilingual taxonomies from Wikipedia. CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops. 2010. pp. 1099-1108 (International Conference on Information and Knowledge Management, Proceedings).
@inproceedings{78a735750a424301aeae679a2bc11fc7,
title = "MENTA: Inducing multilingual taxonomies from Wikipedia",
abstract = "In recent years, a number of projects have turned to Wikipedia to establish large-scale taxonomies that describe orders of magnitude more entities than traditional manually built knowledge bases. So far, however, the multilingual nature of Wikipedia has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is presumably the largest multilingual lexical knowledge base currently available.",
keywords = "Algorithms",
author = "{De Melo}, Gerard and Gerhard Weikum",
year = "2010",
month = "12",
day = "1",
doi = "10.1145/1871437.1871577",
language = "English (US)",
isbn = "9781450300995",
series = "International Conference on Information and Knowledge Management, Proceedings",
pages = "1099--1108",
booktitle = "CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops",

}

De Melo, G & Weikum, G 2010, MENTA: Inducing multilingual taxonomies from Wikipedia. in CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops. International Conference on Information and Knowledge Management, Proceedings, pp. 1099-1108, 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10, Toronto, ON, Canada, 10/26/10. https://doi.org/10.1145/1871437.1871577

MENTA : Inducing multilingual taxonomies from Wikipedia. / De Melo, Gerard; Weikum, Gerhard.

CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops. 2010. p. 1099-1108 (International Conference on Information and Knowledge Management, Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - MENTA

T2 - Inducing multilingual taxonomies from Wikipedia

AU - De Melo, Gerard

AU - Weikum, Gerhard

PY - 2010/12/1

Y1 - 2010/12/1

N2 - In recent years, a number of projects have turned to Wikipedia to establish large-scale taxonomies that describe orders of magnitude more entities than traditional manually built knowledge bases. So far, however, the multilingual nature of Wikipedia has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is presumably the largest multilingual lexical knowledge base currently available.

AB - In recent years, a number of projects have turned to Wikipedia to establish large-scale taxonomies that describe orders of magnitude more entities than traditional manually built knowledge bases. So far, however, the multilingual nature of Wikipedia has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is presumably the largest multilingual lexical knowledge base currently available.

KW - Algorithms

UR - http://www.scopus.com/inward/record.url?scp=78651269398&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78651269398&partnerID=8YFLogxK

U2 - 10.1145/1871437.1871577

DO - 10.1145/1871437.1871577

M3 - Conference contribution

AN - SCOPUS:78651269398

SN - 9781450300995

T3 - International Conference on Information and Knowledge Management, Proceedings

SP - 1099

EP - 1108

BT - CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops

ER -

De Melo G, Weikum G. MENTA: Inducing multilingual taxonomies from Wikipedia. In CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops. 2010. p. 1099-1108. (International Conference on Information and Knowledge Management, Proceedings). https://doi.org/10.1145/1871437.1871577