EAGER: Foundations for the Systematic Study of Synthetic Data

Project Details

Description

Advancements in data-driven models in various fields like natural language processing, computer vision, and robotics have been made possible due to the availability of large amounts of data. However, concerns about privacy rights have been growing, leading to the need for strategies that protect individual data while still allowing data sharing. Synthetic data offers a potential solution by preserving the statistical properties of the original data while removing any personally identifiable information. Although many methods for generating synthetic data have been proposed, there is still a lack of solid theoretical foundations in this area. This project bridges the gap between theory and practice. The project's novelties are in developing the fundamental principles for the systematic study of synthetic data, and in clarifying the technical vocabulary and the associated concepts. The project's broader significance and importance is in its ability to allow institutions to articulate, enforce, evaluate, and validate their required constraints for synthetic data generation methodologies, significantly accelerating data-sharing. With heightened privacy and enhanced utility hand in hand, this framework will shape a world where privacy is safeguarded, knowledge is shared, and AI-based methods truly flourish for the betterment of humanity. The project conceptualizes the philosophical considerations of synthetic data, establishes properties of synthetic data, develops formal definitions of what it means to be “synthetic”, and develops a comprehensive evaluation framework for synthetic data that includes curated datasets, metrics, and baselines. The research improves our scientific understanding of privacy, utility, and synthetic data. The project also cultivates the integration of research and education, by providing new security or privacy projects for undergraduate and graduate research and outreach activities, and serves as an invaluable teaching tool and excellent entry point into the field of privacy and security research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date10/1/239/30/25

Funding

  • National Science Foundation: $250,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.