A distributed approach to enabling privacy-preserving model-based classifier training

Hangzai Luo, Jianping Fan, Xiaodong Lin, Aoying Zhou, Elisa Bertino

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

This paper proposes a novel approach for privacy-preserving distributed model-based classifier training. Our approach is an important step towards supporting customizable privacy modeling and protection. It consists of three major steps. First, each data site independently learns a weak concept model (i.e., local classifier) for a given data pattern or concept by using its own training samples. An adaptive EM algorithm is proposed to select the model structure and estimate the model parameters simultaneously. The second step deals with combined classifier training by integrating the weak concept models that are shared from multiple data sites. To reduce the data transmission costs and the potential privacy breaches, only the weak concept models are sent to the central site and synthetic samples are directly generated from these shared weak concept models at the central site. Both the shared weak concept models and the synthetic samples are then incorporated to learn a reliable and complete global concept model. A computational approach is developed to automatically achieve a good trade off between the privacy disclosure risk, the sharing benefit and the data utility. The third step deals with validating the combined classifier by distributing the global concept model to all these data sites in the collaboration network while at the same time limiting the potential privacy breaches. Our approach has been validated through extensive experiments carried out on four UCI machine learning data sets and two image data sets.

Original languageEnglish (US)
Pages (from-to)157-185
Number of pages29
JournalKnowledge and Information Systems
Volume20
Issue number2
DOIs
StatePublished - 2009
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Keywords

  • Adaptive EM algorithm
  • Privacy-preserving classifier training
  • Synthetic samples

Fingerprint

Dive into the research topics of 'A distributed approach to enabling privacy-preserving model-based classifier training'. Together they form a unique fingerprint.

Cite this