BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, Yang Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Deep neural networks (DNNs) have progressed rapidly during the past decade and have been deployed in various real-world applications. Meanwhile, DNN models have been shown to be vulnerable to security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Specifically, the adversary poisons the target model's training set to mislead any input with an added secret trigger to a target class. Previous backdoor attacks predominantly focus on computer vision (CV) applications, such as image classification. In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods. Specifically, we propose three methods to construct triggers, namely BadChar, BadWord, and BadSentence, including basic and semantic-preserving variants. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility. For instance, using the BadChar, our backdoor attack achieves a 98.9% attack success rate with yielding a utility improvement of 1.5% on the SST-5 dataset when only poisoning 3% of the original set. Moreover, we conduct a user study to prove that our triggers can well preserve the semantics from humans perspective.

Original languageEnglish (US)
Title of host publicationProceedings - 37th Annual Computer Security Applications Conference, ACSAC 2021
PublisherAssociation for Computing Machinery
Pages554-569
Number of pages16
ISBN (Electronic)9781450385794
DOIs
StatePublished - Dec 6 2021
Event37th Annual Computer Security Applications Conference, ACSAC 2021 - Virtual, Online, United States
Duration: Dec 6 2021Dec 10 2021

Publication series

NameACM International Conference Proceeding Series

Conference

Conference37th Annual Computer Security Applications Conference, ACSAC 2021
Country/TerritoryUnited States
CityVirtual, Online
Period12/6/2112/10/21

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Keywords

  • Backdoor attack
  • NLP
  • Semantic-preserving

Fingerprint

Dive into the research topics of 'BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements'. Together they form a unique fingerprint.

Cite this