Privacy-Preserving Knowledge Transfer with Bootstrap Aggregation of Teacher Ensembles

Hong Jun Yoon, Hilda B. Klasky, Eric B. Durbin, Xiao Cheng Wu, Antoinette Stroup, Jennifer Doherty, Linda Coyle, Lynne Penberthy, Christopher Stanley, J. Blair Christian, Georgia D. Tourassi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There is a need to transfer knowledge among institutions and organizations to save effort in annotation and labeling or in enhancing task performance. However, knowledge transfer is difficult because of restrictions that are in place to ensure data security and privacy. Institutions are not allowed to exchange data or perform any activity that may expose personal information. With the leverage of a differential privacy algorithm in a high-performance computing environment, we propose a new training protocol, Bootstrap Aggregation of Teacher Ensembles (BATE), which is applicable to various types of machine learning models. The BATE algorithm is based on and provides enhancements to the PATE algorithm, maintaining competitive task performance scores on complex datasets with underrepresented class labels. We conducted a proof-of-the-concept study of the information extraction from cancer pathology report data from four cancer registries and performed comparisons between four scenarios: no collaboration, no privacy-preserving collaboration, the PATE algorithm, and the proposed BATE algorithm. The results showed that the BATE algorithm maintained competitive macro-averaged F1 scores, demonstrating that the suggested algorithm is an effective yet privacy-preserving method for machine learning and deep learning solutions.

Original languageEnglish (US)
Title of host publicationHeterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB Workshops, Poly 2020 and DMAH 2020, Revised Selected Papers
EditorsVijay Gadepally, Timothy Mattson, Michael Stonebraker, Tim Kraska, Fusheng Wang, Gang Luo, Jun Kong, Alevtina Dubovitskaya
PublisherSpringer Science and Business Media Deutschland GmbH
Pages87-99
Number of pages13
ISBN (Print)9783030710545
DOIs
StatePublished - 2021
EventVLDB workshops: International Workshop on Polystore Systems for Heterogeneous Data in Multiple Databases with Privacy and Security Assurances, Poly 2020, and 6th International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2020 - Virtual, Online
Duration: Aug 31 2020Sep 4 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12633 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceVLDB workshops: International Workshop on Polystore Systems for Heterogeneous Data in Multiple Databases with Privacy and Security Assurances, Poly 2020, and 6th International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2020
CityVirtual, Online
Period8/31/209/4/20

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Keywords

  • Bootstrap aggregation
  • Data privacy
  • Differential privacy
  • Information extraction
  • Natural language processing
  • Privacy-preserving machine learning

Fingerprint

Dive into the research topics of 'Privacy-Preserving Knowledge Transfer with Bootstrap Aggregation of Teacher Ensembles'. Together they form a unique fingerprint.

Cite this