TY - GEN
T1 - Untangling the cross-lingual link structure of Wikipedia
AU - De Melo, Gerard
AU - Weikum, Gerhard
PY - 2010
Y1 - 2010
N2 - Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately, large numbers of links are imprecise or simply wrong. In this paper, techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations. We then present an algorithm with provable properties that uses linear programming and a region growing technique to tackle this challenge. This allows us to transform Wikipedia into a much more consistent multilingual register of the world's entities and concepts.
AB - Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately, large numbers of links are imprecise or simply wrong. In this paper, techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations. We then present an algorithm with provable properties that uses linear programming and a region growing technique to tackle this challenge. This allows us to transform Wikipedia into a much more consistent multilingual register of the world's entities and concepts.
UR - http://www.scopus.com/inward/record.url?scp=84859939814&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859939814&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84859939814
SN - 9781617388088
T3 - ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
SP - 844
EP - 853
BT - ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
T2 - 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
Y2 - 11 July 2010 through 16 July 2010
ER -