A generalized cost optimal decision model for record matching

Vassilios S. Verykios, George V. Moustakides

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Record (or entity) matching or linkage is the process of identifying records in one or more data sources, that refer to the same real world entity or object. In record linkage, the ultimate goal of a decision model is to provide the decision maker with a tool for making decisions upon the actual matching status of a pair of records (i.e., documents, events, persons, cases, etc.). Existing models of record linkage rely on decision rules that minimize the probability of subjecting a case to clerical review, conditional on the probabilities of erroneous matches and erroneous non-matches. In practice though, (a) the value of an erroneous match is, in many applications, quite different from the value of an erroneous non-match, and (b) the cost and the probability of a misclassification, which is associated with the clerical review, is ignored in this way. In this paper, we present a decision model which is optimal, based on the cost of the record linkage operation, and general enough to accommodate multi-class or multi-decision case studies. We also present an example along with the results from applying the proposed model to large comparison spaces.

Original languageEnglish (US)
Title of host publicationIQIS 2004 - International Workshop on Information Quality in Information Systems, Held in Conjunction with the 23rd ACM SIGMOD International Conference on Management of Data
Pages20-26
Number of pages7
DOIs
StatePublished - 2004
Externally publishedYes
EventInternational Workshop on Information Quality in Information Systems, IQIS 2004, Held in Conjunction with the 23rd ACM SIGMOD International Conference on Management of Data - Paris, France
Duration: Jun 18 2004Jun 18 2004

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

OtherInternational Workshop on Information Quality in Information Systems, IQIS 2004, Held in Conjunction with the 23rd ACM SIGMOD International Conference on Management of Data
Country/TerritoryFrance
CityParis
Period6/18/046/18/04

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • probabilistic decision model
  • record matching

Fingerprint

Dive into the research topics of 'A generalized cost optimal decision model for record matching'. Together they form a unique fingerprint.

Cite this