Privacy-preserving imputation of missing data

Geetha Jagannathan, Rebecca N. Wright

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Handling missing data is a critical step to ensuring good results in data mining. Like most data mining algorithms, existing privacy-preserving data mining algorithms assume data is complete. In order to maintain privacy in the data mining process while cleaning data, privacy-preserving methods of data cleaning are required. In this paper, we address the problem of privacy-preserving data imputation of missing data. We present a privacy-preserving protocol for filling in missing values using a lazy decision-tree imputation algorithm for data that is horizontally partitioned between two parties. The participants of the protocol learn only the imputed values. The computed decision tree is not learned by either party.

Original languageEnglish (US)
Pages (from-to)40-56
Number of pages17
JournalData and Knowledge Engineering
Volume65
Issue number1
DOIs
StatePublished - Apr 1 2008

Fingerprint

Data mining
Data privacy
Decision trees
Cleaning
Data handling
Missing data
Imputation
Privacy preserving
Data cleaning
Decision tree

All Science Journal Classification (ASJC) codes

  • Information Systems and Management

Keywords

  • Data cleaning
  • Data imputation
  • Privacy-preserving protocols

Cite this

@article{fae80ac556a84b8c90290b5ec8275135,
title = "Privacy-preserving imputation of missing data",
abstract = "Handling missing data is a critical step to ensuring good results in data mining. Like most data mining algorithms, existing privacy-preserving data mining algorithms assume data is complete. In order to maintain privacy in the data mining process while cleaning data, privacy-preserving methods of data cleaning are required. In this paper, we address the problem of privacy-preserving data imputation of missing data. We present a privacy-preserving protocol for filling in missing values using a lazy decision-tree imputation algorithm for data that is horizontally partitioned between two parties. The participants of the protocol learn only the imputed values. The computed decision tree is not learned by either party.",
keywords = "Data cleaning, Data imputation, Privacy-preserving protocols",
author = "Geetha Jagannathan and Wright, {Rebecca N.}",
year = "2008",
month = "4",
day = "1",
doi = "10.1016/j.datak.2007.06.013",
language = "English (US)",
volume = "65",
pages = "40--56",
journal = "Data and Knowledge Engineering",
issn = "0169-023X",
publisher = "Elsevier",
number = "1",

}

Privacy-preserving imputation of missing data. / Jagannathan, Geetha; Wright, Rebecca N.

In: Data and Knowledge Engineering, Vol. 65, No. 1, 01.04.2008, p. 40-56.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Privacy-preserving imputation of missing data

AU - Jagannathan, Geetha

AU - Wright, Rebecca N.

PY - 2008/4/1

Y1 - 2008/4/1

N2 - Handling missing data is a critical step to ensuring good results in data mining. Like most data mining algorithms, existing privacy-preserving data mining algorithms assume data is complete. In order to maintain privacy in the data mining process while cleaning data, privacy-preserving methods of data cleaning are required. In this paper, we address the problem of privacy-preserving data imputation of missing data. We present a privacy-preserving protocol for filling in missing values using a lazy decision-tree imputation algorithm for data that is horizontally partitioned between two parties. The participants of the protocol learn only the imputed values. The computed decision tree is not learned by either party.

AB - Handling missing data is a critical step to ensuring good results in data mining. Like most data mining algorithms, existing privacy-preserving data mining algorithms assume data is complete. In order to maintain privacy in the data mining process while cleaning data, privacy-preserving methods of data cleaning are required. In this paper, we address the problem of privacy-preserving data imputation of missing data. We present a privacy-preserving protocol for filling in missing values using a lazy decision-tree imputation algorithm for data that is horizontally partitioned between two parties. The participants of the protocol learn only the imputed values. The computed decision tree is not learned by either party.

KW - Data cleaning

KW - Data imputation

KW - Privacy-preserving protocols

UR - http://www.scopus.com/inward/record.url?scp=39749149272&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=39749149272&partnerID=8YFLogxK

U2 - 10.1016/j.datak.2007.06.013

DO - 10.1016/j.datak.2007.06.013

M3 - Article

AN - SCOPUS:39749149272

VL - 65

SP - 40

EP - 56

JO - Data and Knowledge Engineering

JF - Data and Knowledge Engineering

SN - 0169-023X

IS - 1

ER -