TY - JOUR
T1 - TransRisk
T2 - Mobility privacy risk prediction based on transferred knowledge
AU - Xie, Xiaoyang
AU - Hong, Zhiqing
AU - Qin, Zhou
AU - Fang, Zhihan
AU - Tian, Yuan
AU - Zhang, Desheng
N1 - Funding Information:
This work is partially supported by NSF 1849238, 1932223, 1951890, 1952096, 2003874, 2047822, 1920462, 1943100, 2002985 and the Facebook Faculty Fellowship. We thank all the reviewers for their insightful feedback to improve this paper.
Publisher Copyright:
© 2022 ACM.
PY - 2022/7
Y1 - 2022/7
N2 - Human mobility data may lead to privacy concerns because a resident can be re-identified from these data by malicious attacks even with anonymized user IDs. For an urban service collecting mobility data, an efficient privacy risk assessment is essential for the privacy protection of its users. The existing methods enable efficient privacy risk assessments for service operators to fast adjust the quality of sensing data to lower privacy risk by using prediction models. However, for these prediction models, most of them require massive training data, which has to be collected and stored first. Such a large-scale long-Term training data collection contradicts the purpose of privacy risk prediction for new urban services, which is to ensure that the quality of high-risk human mobility data is adjusted to low privacy risk within a short time. To solve this problem, we present a privacy risk prediction model based on transfer learning, i.e., TransRisk, to predict the privacy risk for a new target urban service through (1) small-scale short-Term data of its own, and (2) the knowledge learned from data from other existing urban services. We envision the application of TransRisk on the traffic camera surveillance system and evaluate it with real-world mobility datasets already collected in a Chinese city, Shenzhen, including four source datasets, i.e., (i) one call detail record dataset (CDR) with 1.2 million users; (ii) one cellphone connection data dataset (CONN) with 1.2 million users; (iii) a vehicular GPS dataset (Vehicles) with 10 thousand vehicles; (iv) an electronic toll collection transaction dataset (ETC) with 156 thousand users, and a target dataset, i.e., a camera dataset (Camera) with 248 cameras. The results show that our model outperforms the state-of-The-Art methods in terms of RMSE and MAE. Our work also provides valuable insights and implications on mobility data privacy risk assessment for both current and future large-scale services.
AB - Human mobility data may lead to privacy concerns because a resident can be re-identified from these data by malicious attacks even with anonymized user IDs. For an urban service collecting mobility data, an efficient privacy risk assessment is essential for the privacy protection of its users. The existing methods enable efficient privacy risk assessments for service operators to fast adjust the quality of sensing data to lower privacy risk by using prediction models. However, for these prediction models, most of them require massive training data, which has to be collected and stored first. Such a large-scale long-Term training data collection contradicts the purpose of privacy risk prediction for new urban services, which is to ensure that the quality of high-risk human mobility data is adjusted to low privacy risk within a short time. To solve this problem, we present a privacy risk prediction model based on transfer learning, i.e., TransRisk, to predict the privacy risk for a new target urban service through (1) small-scale short-Term data of its own, and (2) the knowledge learned from data from other existing urban services. We envision the application of TransRisk on the traffic camera surveillance system and evaluate it with real-world mobility datasets already collected in a Chinese city, Shenzhen, including four source datasets, i.e., (i) one call detail record dataset (CDR) with 1.2 million users; (ii) one cellphone connection data dataset (CONN) with 1.2 million users; (iii) a vehicular GPS dataset (Vehicles) with 10 thousand vehicles; (iv) an electronic toll collection transaction dataset (ETC) with 156 thousand users, and a target dataset, i.e., a camera dataset (Camera) with 248 cameras. The results show that our model outperforms the state-of-The-Art methods in terms of RMSE and MAE. Our work also provides valuable insights and implications on mobility data privacy risk assessment for both current and future large-scale services.
KW - Heterogeneous Datasets
KW - Mobility Patterns
KW - Privacy
UR - http://www.scopus.com/inward/record.url?scp=85134256871&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134256871&partnerID=8YFLogxK
U2 - 10.1145/3534581
DO - 10.1145/3534581
M3 - Article
AN - SCOPUS:85134256871
SN - 2474-9567
VL - 6
JO - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
JF - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
IS - 2
M1 - 82
ER -