Untangling the cross-lingual link structure of Wikipedia

Gerard De Melo, Gerhard Weikum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

30 Scopus citations

Abstract

Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately, large numbers of links are imprecise or simply wrong. In this paper, techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations. We then present an algorithm with provable properties that uses linear programming and a region growing technique to tackle this challenge. This allows us to transform Wikipedia into a much more consistent multilingual register of the world's entities and concepts.

Original languageEnglish (US)
Title of host publicationACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Pages844-853
Number of pages10
StatePublished - 2010
Externally publishedYes
Event48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden
Duration: Jul 11 2010Jul 16 2010

Publication series

NameACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Other

Other48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
Country/TerritorySweden
CityUppsala
Period7/11/107/16/10

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Untangling the cross-lingual link structure of Wikipedia'. Together they form a unique fingerprint.

Cite this