Project Details
Description
The maturing of mobile devices and systems provide an unprecedented opportunity to collect a large amount of data about real world human motion at all scales. The rich knowledge contained in these data sets can have a huge impact in many fields ranging from transportation to health care, from civil engineering to energy management, from e-commerce to social networking. While the applications are paradigm-transforming, recent studies show that the trajectory data can raise serious privacy concerns in revealing personally sensitive information such as frequently visited locations or social ties. These concerns become the major hurdle in utilizing these data sets. This project systematically studies the issue of anonymizing trajectory data, from the bottom layer of trajectory sensing and data collection, to the middle layer of trajectory representation and anonymity, to the application layer of how the anonymized trajectory data can be used.
By the nature of trajectories as being time stamped sequence of points, in this project novel geometric and topological algorithms that directly work on distributed sensors collecting the trajectories are developed for achieving the objective. Queries to such decentralized sensors are made to ensure no sensitive information is released. The intellectual contribution lies in the following aspects. 1) The topological representation of trajectories, i.e., how trajectories pass around obstacles and landmarks in the domain is adopted. The topological representation is compact and descriptive, introducing novel discrete and combinatorial problems to study. 2) A novel framework is developed for distributed sensors to directly learn, classify and compare the topological types of the target trajectories, using harmonic one-forms and Hodge decomposition from algebraic topology. The new framework can substantially reduce the communication cost within the network, while maintaining the requirement of user privacy from the very beginning of sensing and data collection. 3) A family of anonymization algorithms using different ideas are developed, by altering the way to connect the time-stamped points into trajectories, by adjusting the topological resolution to reach a balance between data anonymity and utility, and by sensing and recording randomized hash data to answer popular trajectory queries. 4) The trajectory data sets are often huge, so algorithms for handling large scale trajectory data sets are developed in both centralized and decentralized settings.
Status | Finished |
---|---|
Effective start/end date | 9/1/16 → 8/31/20 |
Funding
- National Science Foundation: $250,000.00