Big data analytics can revolutionize innovation and productivity across diverse domains. However, this requires sharing or joint analysis of data, which is often inhibited due to privacy and security concerns. While techniques have been developed to enable the safe use of data for analysis, none of these work for the critical task of outlier detection. Outlier detection is one of the most fundamental data analysis tasks, useful in applications as far ranging as homeland security, to medical informatics, to financial fraud. However, when the data is fragmented and cannot be collected together, it is impossible even to appropriately identify outliers, much less explain them. This project aims to fill this gap, and enable the secure identification and explanation of outliers without breaching the privacy of the data owners, the data custodians, or the data subjects. The potential to advance science through the discovery and analysis of exceptions can have unparalleled impact and significantly help in widening co-operation, thus preventing loss through data isolation.The project develops strong definitions for private outlier detection encompassing both process privacy and result privacy. A suite of privacy-preserving tools and techniques are then developed to enable outlier detection across different data ownership models, over a variety of multi-modal datasets, while supporting differing tradeoffs of privacy, efficiency, and utility. The research improves our scientific understanding of secure computation, data outsourcing and distributed data analysis. The project also cultivates the integration of research and education, by providing opportunities for research by undergraduates at an early stage.
|Effective start/end date||10/1/14 → 9/30/17|
- National Science Foundation (National Science Foundation (NSF))