TY - JOUR
T1 - Relevance assessments and retrieval system evaluation
AU - Lesk, M. E.
AU - Salton, G.
N1 - Funding Information:
t This study was supported in part by the National Science Foundation under grants GN 495 and GN 750. 343
PY - 1968/12
Y1 - 1968/12
N2 - Two widely used criteria for evaluating the effectiveness of information retrieval systems are, respectively, the recall and the precision. Since the determination of these measures is dependent on a distinction between documents which are relevant to a given query and documents which are not relevant to that query, it has sometimes been claimed that an accurate, generally valid evaluation cannot be based on recall and precision measures. A study was made to determine the effect of variations in relevance assessments on the average recall and precision values used to measure retrieval effectiveness. Using a collection of 1200 documents in information science for test purposes, it is found that large scale differences in the relevance assessments do not produce significant variations in average recall and precision. It thus appears that properly computed recall and precision data may represent effectiveness indicators which are generally valid for many distinct user classes.
AB - Two widely used criteria for evaluating the effectiveness of information retrieval systems are, respectively, the recall and the precision. Since the determination of these measures is dependent on a distinction between documents which are relevant to a given query and documents which are not relevant to that query, it has sometimes been claimed that an accurate, generally valid evaluation cannot be based on recall and precision measures. A study was made to determine the effect of variations in relevance assessments on the average recall and precision values used to measure retrieval effectiveness. Using a collection of 1200 documents in information science for test purposes, it is found that large scale differences in the relevance assessments do not produce significant variations in average recall and precision. It thus appears that properly computed recall and precision data may represent effectiveness indicators which are generally valid for many distinct user classes.
UR - http://www.scopus.com/inward/record.url?scp=0009233105&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0009233105&partnerID=8YFLogxK
U2 - 10.1016/0020-0271(68)90029-6
DO - 10.1016/0020-0271(68)90029-6
M3 - Article
AN - SCOPUS:0009233105
SN - 0020-0271
VL - 4
SP - 343
EP - 359
JO - Information Storage and Retrieval
JF - Information Storage and Retrieval
IS - 4
ER -