A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis

Jaideep Vaidya, Basit Shafiq, Muazzam Asani, Nabil Adam, Xiaoqian Jiang, Lucila Ohno-Machado

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Big data coupled with precision medicine has the potential to significantly improve our understanding and treatment of complex disorders, such as cancer, diabetes, depression, etc. However, the essential problem is that data are stuck in silos, and it is difficult to precisely identify which data would be relevant and useful for any particular type of analysis. While the process to acquire and access biomedical data requires significant effort, in many cases the data may not provide much insight to the problem at hand. Therefore, there is a need to be able to measure the utility/relevance of additional datasets for a particular biomedical research task without direct access to the data. Towards this, in this paper, we develop a privacy-preserving approach to create synthetic data that can provide a firstorder approximation of utility. We evaluate the proposed approach with several biomedical datasets in the context of regression and classification tasks and discuss how it can be incorporated into existing data management systems such as REDCap.

Original languageEnglish (US)
Pages (from-to)1695-1704
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
StatePublished - 2017

All Science Journal Classification (ASJC) codes

  • Medicine(all)


Dive into the research topics of 'A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis'. Together they form a unique fingerprint.

Cite this