TY - GEN
T1 - Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification
AU - Cheng, Siyuan
AU - Liu, Yingqi
AU - Ma, Shiqing
AU - Zhang, Xiangyu
N1 - Funding Information:
This research was supported, in part by NSF 1901242 and 1910300, ONR N000141712045, N000141410468 and N000141712947, and IARPA TrojAI W911NF-19-S-0012. Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.
Publisher Copyright:
Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved
PY - 2021
Y1 - 2021
N2 - Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.
AB - Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.
UR - http://www.scopus.com/inward/record.url?scp=85129938499&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129938499&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85129938499
T3 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
SP - 1148
EP - 1156
BT - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
PB - Association for the Advancement of Artificial Intelligence
T2 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
Y2 - 2 February 2021 through 9 February 2021
ER -