TY - GEN
T1 - Structure and content scoring for XML
AU - Amer-Yahia, Sihem
AU - Koudas, Nick
AU - Marian, Amelie
AU - Srivastava, Divesh
AU - Toman, David
PY - 2005
Y1 - 2005
N2 - XML repositories are usually queried both on structure and content. Due to structural heterogeneity of XML, queries are often interpreted approximately and their answers are returned ranked by scores. Computing answer scores in XML is an active area of research that oscillates between pure content scoring such as the well-known tf*idf and taking structure into account. However, none of the existing proposals fully accounts for structure and combines it with content to score query answers. We propose novel XML scoring methods that are inspired by tf*idf and that account for both structure and content while considering query relaxations. Twig scoring, accounts for the most structure and content and is thus used as our reference method. Path scoring is an approximation that loosens correlations between query nodes hence reducing the amount of time required to manipulate scores during top-k query processing. We propose efficient data structures in order to speed up ranked query processing. We run extensive experiments that validate our scoring methods and that show that path scoring provides very high precision while improving score computation time.
AB - XML repositories are usually queried both on structure and content. Due to structural heterogeneity of XML, queries are often interpreted approximately and their answers are returned ranked by scores. Computing answer scores in XML is an active area of research that oscillates between pure content scoring such as the well-known tf*idf and taking structure into account. However, none of the existing proposals fully accounts for structure and combines it with content to score query answers. We propose novel XML scoring methods that are inspired by tf*idf and that account for both structure and content while considering query relaxations. Twig scoring, accounts for the most structure and content and is thus used as our reference method. Path scoring is an approximation that loosens correlations between query nodes hence reducing the amount of time required to manipulate scores during top-k query processing. We propose efficient data structures in order to speed up ranked query processing. We run extensive experiments that validate our scoring methods and that show that path scoring provides very high precision while improving score computation time.
UR - http://www.scopus.com/inward/record.url?scp=33745541413&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745541413&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33745541413
SN - 1595931546
SN - 9781595931542
T3 - VLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
SP - 361
EP - 372
BT - VLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
T2 - VLDB 2005 - 31st International Conference on Very Large Data Bases
Y2 - 30 August 2005 through 2 September 2005
ER -