TY - GEN
T1 - ABS
T2 - 26th ACM SIGSAC Conference on Computer and Communications Security, CCS 2019
AU - Liu, Yingqi
AU - Ma, Shiqing
AU - Lee, Wen Chuan
AU - Aafer, Yousra
AU - Tao, Guanhong
AU - Zhang, Xiangyu
N1 - Funding Information:
We thank the anonymous reviewers for their constructive comments. This research was supported, in part by DARPA FA8650-15-C-7562, NSF 1748764, 1901242 and 1910300, ONR N000141410468 and N000141712947, and Sandia National Lab under award 1701331. Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.
Publisher Copyright:
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2019/11/6
Y1 - 2019/11/6
N2 - This paper presents a technique to scan neural network based AI models to determine if they are trojaned. Pre-trained AI models may contain back-doors that are injected through training or by transforming inner neuron weights. These trojaned models operate normally when regular inputs are provided, and mis-classify to a specific output label when the input is stamped with some special pattern called trojan trigger. We develop a novel technique that analyzes inner neuron behaviors by determining how output activations change when we introduce different levels of stimulation to a neuron. The neurons that substantially elevate the activation of a particular output label regardless of the provided input is considered potentially compromised. Trojan trigger is then reverse-engineered through an optimization procedure using the stimulation analysis results, to confirm that a neuron is truly compromised. We evaluate our system ABS on 177 trojaned models that are trojaned with various attack methods that target both the input space and the feature space, and have various trojan trigger sizes and shapes, together with 144 benign models that are trained with different data and initial weight values. These models belong to 7 different model structures and 6 different datasets, including some complex ones such as ImageNet, VGG-Face and ResNet110. Our results show that ABS is highly effective, can achieve over 90% detection rate for most cases (and many 100%), when only one input sample is provided for each output label. It substantially out-performs the state-of-the-art technique Neural Cleanse that requires a lot of input samples and small trojan triggers to achieve good performance.
AB - This paper presents a technique to scan neural network based AI models to determine if they are trojaned. Pre-trained AI models may contain back-doors that are injected through training or by transforming inner neuron weights. These trojaned models operate normally when regular inputs are provided, and mis-classify to a specific output label when the input is stamped with some special pattern called trojan trigger. We develop a novel technique that analyzes inner neuron behaviors by determining how output activations change when we introduce different levels of stimulation to a neuron. The neurons that substantially elevate the activation of a particular output label regardless of the provided input is considered potentially compromised. Trojan trigger is then reverse-engineered through an optimization procedure using the stimulation analysis results, to confirm that a neuron is truly compromised. We evaluate our system ABS on 177 trojaned models that are trojaned with various attack methods that target both the input space and the feature space, and have various trojan trigger sizes and shapes, together with 144 benign models that are trained with different data and initial weight values. These models belong to 7 different model structures and 6 different datasets, including some complex ones such as ImageNet, VGG-Face and ResNet110. Our results show that ABS is highly effective, can achieve over 90% detection rate for most cases (and many 100%), when only one input sample is provided for each output label. It substantially out-performs the state-of-the-art technique Neural Cleanse that requires a lot of input samples and small trojan triggers to achieve good performance.
KW - AI trojan attacks
KW - Artificial brain stimulation
KW - Deep learning system
UR - http://www.scopus.com/inward/record.url?scp=85075950247&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075950247&partnerID=8YFLogxK
U2 - 10.1145/3319535.3363216
DO - 10.1145/3319535.3363216
M3 - Conference contribution
AN - SCOPUS:85075950247
T3 - Proceedings of the ACM Conference on Computer and Communications Security
SP - 1265
EP - 1282
BT - CCS 2019 - Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security
PB - Association for Computing Machinery
Y2 - 11 November 2019 through 15 November 2019
ER -