Structure and content scoring for XML

Sihem Amer-Yahia, Nick Koudas, Amelie Marian, Divesh Srivastava, David Toman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

111 Scopus citations

Abstract

XML repositories are usually queried both on structure and content. Due to structural heterogeneity of XML, queries are often interpreted approximately and their answers are returned ranked by scores. Computing answer scores in XML is an active area of research that oscillates between pure content scoring such as the well-known tf*idf and taking structure into account. However, none of the existing proposals fully accounts for structure and combines it with content to score query answers. We propose novel XML scoring methods that are inspired by tf*idf and that account for both structure and content while considering query relaxations. Twig scoring, accounts for the most structure and content and is thus used as our reference method. Path scoring is an approximation that loosens correlations between query nodes hence reducing the amount of time required to manipulate scores during top-k query processing. We propose efficient data structures in order to speed up ranked query processing. We run extensive experiments that validate our scoring methods and that show that path scoring provides very high precision while improving score computation time.

Original languageEnglish (US)
Title of host publicationVLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
Pages361-372
Number of pages12
StatePublished - 2005
Externally publishedYes
EventVLDB 2005 - 31st International Conference on Very Large Data Bases - Trondheim, Norway
Duration: Aug 30 2005Sep 2 2005

Publication series

NameVLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
Volume1

Other

OtherVLDB 2005 - 31st International Conference on Very Large Data Bases
Country/TerritoryNorway
CityTrondheim
Period8/30/059/2/05

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Structure and content scoring for XML'. Together they form a unique fingerprint.

Cite this