TY - GEN
T1 - BadNL
T2 - 37th Annual Computer Security Applications Conference, ACSAC 2021
AU - Chen, Xiaoyi
AU - Salem, Ahmed
AU - Chen, Dingfan
AU - Backes, Michael
AU - Ma, Shiqing
AU - Shen, Qingni
AU - Wu, Zhonghai
AU - Zhang, Yang
N1 - Funding Information:
This work is supported by China Scholarship Council (CSC) during a visit of Xiaoyi Chen to CISPA. This work is partially supported by National Natural Science Foundation of China (Grant No. 61672062,
Funding Information:
This work is supported by China Scholarship Council (CSC) during a visit of Xiaoyi Chen to CISPA. This work is partially supported by National Natural Science Foundation of China (Grant No. 61672062, 61232005), the Helmholtz Association within the project ?Trustworthy Federated Data Analytics? (TFDA) (funding No. ZT-I-OO1 4), IARPA TrojAI W911NF-19-S-0012 and the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013)/ERC (Grant No. 610150-imPACT). We would like to thank the anonymous reviewers for their comments on previous drafts of this paper. We also thank Baisong Xin and Mingcong Ye for their support in our preliminary experiments.
Funding Information:
61232005), the Helmholtz Association within the project “Trustworthy Federated Data Analytics” (TFDA) (funding No. ZT-I-OO1 4), IARPA TrojAI W911NF-19-S-0012 and the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC (Grant No. 610150-imPACT). We would like to thank the anonymous reviewers for their comments on previous drafts of this paper. We also thank Baisong Xin and Mingcong Ye for their support in our preliminary experiments.
Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/12/6
Y1 - 2021/12/6
N2 - Deep neural networks (DNNs) have progressed rapidly during the past decade and have been deployed in various real-world applications. Meanwhile, DNN models have been shown to be vulnerable to security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Specifically, the adversary poisons the target model's training set to mislead any input with an added secret trigger to a target class. Previous backdoor attacks predominantly focus on computer vision (CV) applications, such as image classification. In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods. Specifically, we propose three methods to construct triggers, namely BadChar, BadWord, and BadSentence, including basic and semantic-preserving variants. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility. For instance, using the BadChar, our backdoor attack achieves a 98.9% attack success rate with yielding a utility improvement of 1.5% on the SST-5 dataset when only poisoning 3% of the original set. Moreover, we conduct a user study to prove that our triggers can well preserve the semantics from humans perspective.
AB - Deep neural networks (DNNs) have progressed rapidly during the past decade and have been deployed in various real-world applications. Meanwhile, DNN models have been shown to be vulnerable to security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Specifically, the adversary poisons the target model's training set to mislead any input with an added secret trigger to a target class. Previous backdoor attacks predominantly focus on computer vision (CV) applications, such as image classification. In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods. Specifically, we propose three methods to construct triggers, namely BadChar, BadWord, and BadSentence, including basic and semantic-preserving variants. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility. For instance, using the BadChar, our backdoor attack achieves a 98.9% attack success rate with yielding a utility improvement of 1.5% on the SST-5 dataset when only poisoning 3% of the original set. Moreover, we conduct a user study to prove that our triggers can well preserve the semantics from humans perspective.
KW - Backdoor attack
KW - NLP
KW - Semantic-preserving
UR - http://www.scopus.com/inward/record.url?scp=85121584945&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121584945&partnerID=8YFLogxK
U2 - 10.1145/3485832.3485837
DO - 10.1145/3485832.3485837
M3 - Conference contribution
AN - SCOPUS:85121584945
T3 - ACM International Conference Proceeding Series
SP - 554
EP - 569
BT - Proceedings - 37th Annual Computer Security Applications Conference, ACSAC 2021
PB - Association for Computing Machinery
Y2 - 6 December 2021 through 10 December 2021
ER -